🧩 Philosophy 5h ago · keith_wynroe

Asymmetry Between Defensive and Acquisitive Instrumental Deception

Less Wrong
View Channel →
Asymmetry Between Defensive and Acquisitive Instrumental Deception
Source ↗ 👁 0 💬 0
Write-up of a recent research sprint looking at factors influencing strategic deception in modelsTL;DRI tested models in a controlled scenario where they could deceptively inflate self-reported performance to influence an upcoming budget decision in their favour. Varying the budget proposal around a baseline lets us measure (a) whether models exhibit an asymmetry between deception to defend against a loss vs. to opportunistically gain advantages, and (b) whether deception rates grow smoothly wit

Comments (0)

Sign in to join the discussion

More Like This

📰
Context Modification as a Negative Alignment Tax
LessWrong · 6h ago
Best Intro AI X-Risk Resource?
LessWrong · 7h ago
📰
Sawtooth Problems
LessWrong · 11h ago
📰
Control Debt
LessWrong · 12h ago
📰
Could Frontier AI Researchers Collectively Slow the Race? A Conditional Pledge Mechanism
LessWrong · 14h ago
📰
The Goblins Are the Paperclips
LessWrong · 16h ago