🧩 Philosophy 7h ago · DanielFilan

Retrospective on my unsupervised elicitation challenge

Less Wrong
View Channel →
Source ↗ 👁 0 💬 0
This post contains spoilers for the unsupervised elicitation challenge of getting Claude to get my Ancient Greek homework right.

tl;dr Opus 4.7 one-shots it, nothing else worked.

The challenge

A few weeks ago, I announced to the world my Unsupervised Elicitation Challenge (my blog, LessWrong). I’d encourage you to read that post for the context, but the tl;dr is that there was a fill-in-the-blank exercise early on in my Ancient Greek textbook that Claude Opus 4.6 didn’t fill out correctly by

Comments (0)

Sign in to join the discussion

More Like This

📰
How does Reinforcement Learning Affect Models
LessWrong · 2h ago
📰
The Case For Universalism
LessWrong · 2h ago
Emergent misalignment evident in activations at low poisoning doses - long before behavioral checks flag it
LessWrong · 6h ago
📰
Massapequa ACX Meetup
LessWrong · 6h ago
Alignment Faking Replication and Chain-of-Thought Monitoring Extensions
LessWrong · 7h ago
📰
Training a Transformer to Compose One Step Per Layer (and Proving It)
LessWrong · 7h ago