🧩 Philosophy 19h ago · oakhu

Can activation verbalizers surface an internal chain of thought?

Less Wrong
View Channel →
Can activation verbalizers surface an internal chain of thought?
Source ↗ 👁 0 💬 0
We introduce an evaluation for activation verbalizers: can they surface a target model's reasoning as it solves a math problem in a single forward pass? For open-weight NLAs, the answer seems to be: "possibly, but definitely not reliably".Lots of important capabilities currently require AI models to reason "out loud" in a natural-language chain of thought, which means that we can monitor important parts of their thinking. It would be nice to have this same affordance for the reasoning that model

Comments (0)

Sign in to join the discussion

More Like This

📰
Iliad is Hiring
LessWrong · 10h ago
📰
Neglected Basics of AI Alignment
LessWrong · 14h ago
The Hats of LessOnline
LessWrong · 14h ago
Frontier Models Still Lag Behind Humans at Robust Belief-State Tracking
LessWrong · 23h ago
📰
Coming Around To Political Donations
LessWrong · 1d ago
📰
Analysis of Metastable States in the Transformer Activation Space
LessWrong · 1d ago