Asymmetry Between Defensive and Acquisitive Instrumental Deception
Source ↗
👁 0
💬 0
Write-up of a recent research sprint looking at factors influencing strategic deception in modelsTL;DRI tested models in a controlled scenario where they could deceptively inflate self-reported performance to influence an upcoming budget decision in their favour. Varying the budget proposal around a baseline lets us measure (a) whether models exhibit an asymmetry between deception to defend against a loss vs. to opportunistically gain advantages, and (b) whether deception rates grow smoothly wit
Comments (0)