🧩 Philosophy 6h ago · reinthal

The Changing North Star of AI Control

Less Wrong

Source ↗ 👁 0 💬 0

On December 1st, 2025, the GDM mech interp team published a LessWrong article declaring a pivot to a pragmatic approach to interpretability. Much time had been lost chasing numbers in SAE reconstruction loss; they argued that optimizing such proxies did not get them closer to their north star: making AI go well through understanding how deep neural networks process information mechanistically. Has AI control gone the way of mech interp? Is its north star of stopping harmful actions from human-le