🧩 Philosophy 6d ago · kromem

Simulating Simulators

Less Wrong
View Channel →
Simulating Simulators
Source ↗ 👁 7 💬 0
Author’s note: This piece relates to things I initially discovered in Opus 4 over the months after release, which I’ve mostly kept private since. I promised myself that when labs moved on to focusing on interpretability vector activations in place of reasoning traces for what invariably gets Goodharted, that it’d be a necessary disclosure as the risks in what might get trampled over outweighed the risks in what might end up targeted.And well… here we are.P.S. TL;DRs added where possible.Board Ga

Comments (0)

Sign in to join the discussion

More Like This

Necessary Losses: The Life-Shaping Art of Letting Go
The Marginalian · 18h ago
Mini-Heap
Daily Nous · 19h ago
Do Polymath LLMs Love Big Brother?
Overcoming Bias · 1d ago
The Meaning of Maturity: Ursula K. Le Guin on What It Really Takes to Grow Up
The Marginalian · 1d ago
A New Tool for Curbing AI Cheating (guest post)
Daily Nous · 1d ago
Joanna Rosenfeld – The Resonant Void – George Berger Column
3:AM Magazine · 1d ago