🧩 Philosophy 4h ago · dominicq

Small models also found the vulnerabilities that Mythos found

Less Wrong
View Channel →
Source ↗ 👁 0 💬 0
We took the specific vulnerabilities Anthropic showcases in their announcement, isolated the relevant code, and ran them through small, cheap, open-weights models. Those models recovered much of the same analysis. Eight out of eight models detected Mythos's flagship FreeBSD exploit, including one with only 3.6 billion active parameters costing $0.11 per million tokens. A 5.1B-active open model recovered the core chain of the 27-year-old OpenBSD bug.

I've been more skeptical than the average rea

Comments (0)

Sign in to join the discussion

More Like This

📰
Claude Interviews Me About Writing
LessWrong · 2h ago
An apple picking model for AI R&D
LessWrong · 4h ago
📰
Catching illicit distributed training operations during an AI pause
LessWrong · 5h ago
📰
Dreams of the Future
LessWrong · 5h ago
Proof Explained: Touchette-Lloyd Theorem
LessWrong · 5h ago
📰
Pausing AI Is the Best Answer to Post-Alignment Problems
LessWrong · 6h ago