🧩 Philosophy 14h ago · Hisku

The Goblins Are the Paperclips

Less Wrong
View Channel →
Source ↗ 👁 0 💬 0
Last week OpenAI published Where the goblins came from, explaining why their models started slipping creature metaphors into unrelated outputs. The story has been treated as a quirky anecdote: endearing, slightly embarrassing, fixed with a developer-prompt instruction. But I think it deserves a more interesting reading, since the goblin episode is the cleanest evidence we have for the optimization mechanics that paperclip arguments rely on, and the usual objections to those arguments don't engag

Comments (0)

Sign in to join the discussion

More Like This

Asymmetry Between Defensive and Acquisitive Instrumental Deception
LessWrong · 4h ago
📰
Context Modification as a Negative Alignment Tax
LessWrong · 5h ago
Best Intro AI X-Risk Resource?
LessWrong · 5h ago
📰
Sawtooth Problems
LessWrong · 10h ago
📰
Control Debt
LessWrong · 11h ago
📰
Could Frontier AI Researchers Collectively Slow the Race? A Conditional Pledge Mechanism
LessWrong · 13h ago