🧩 Philosophy 5h ago · Sebastian Prasanna

An Empirical Study of Methods for SFTing Opaque Reasoning Models

Less Wrong
View Channel →
An Empirical Study of Methods for SFTing Opaque Reasoning Models
Source ↗ 👁 1 💬 0
We open-source our code here.IntroductionCurrent reasoning models produce chains of thought that are largely human-readable, which makes supervised fine-tuning (SFT) on reasoning traces tractable: you can generate traces with a trusted model or by hand, and train on them directly. But it's not clear whether this will keep working. Future models may reason in ways that are hard to imitate—with chains of thought that use English in idiosyncratic ways, or even by reasoning in a continuous latent sp

Comments (0)

Sign in to join the discussion

More Like This

Kern & Nebel Win Sanders Political Philosophy Prize
Daily Nous · 5h ago
The Bird That Is Your Life
The Marginalian · 5h ago
Contra Events Pairing Callers By Age?
LessWrong · 5h ago
📰
Mathematics and Empiricism
LessWrong · 6h ago
📰
Protecting Cognitive Integrity: Our internal AI use policy (V1)
LessWrong · 7h ago
📰
Today might be my last birthday
LessWrong · 7h ago