🧩 Philosophy 6h ago · reinthal

The Changing North Star of AI Control

Less Wrong
View Channel →
The Changing North Star of AI Control
Source ↗ 👁 0 💬 0
On December 1st, 2025, the GDM mech interp team published a LessWrong article declaring a pivot to a pragmatic approach to interpretability. Much time had been lost chasing numbers in SAE reconstruction loss; they argued that optimizing such proxies did not get them closer to their north star: making AI go well through understanding how deep neural networks process information mechanistically. Has AI control gone the way of mech interp? Is its north star of stopping harmful actions from human-le

Comments (0)

Sign in to join the discussion

More Like This

My Class And Goals
Overcoming Bias · 3h ago
Announcing Progress Conference 2026
LessWrong · 5h ago
Smart Contracts as Credible Commitments for Trading with Scheming AIs
LessWrong · 5h ago
A Buddhism for Every Enneagram Type
LessWrong · 6h ago
📰
Call for machers
LessWrong · 7h ago
Only Politics Can Prevent Extinction*
LessWrong · 7h ago