🧩 Philosophy 10h ago · Jannes Elstner

Improving Petri scheming audits with environment blueprints

Less Wrong
View Channel →
Source ↗ 👁 0 💬 0
This is a short write-up of work conducted as part of the MATS 9.0 program. We thank Victoria Krakovna for mentorship and Fred Bruford for research management.TL;DR: We introduce a pipeline that generates environment blueprints for realistic scheming propensity evaluations. As a case study, we test how well these blueprints power investigator agents — specifically Petri — in auditing Gemini 3.1 Pro Preview for code sabotage. Compared to the baseline Petri, Blueprint-Petri results in audits that

Comments (0)

Sign in to join the discussion

More Like This

📰
Some Thoughts on Bengio's Scientist AI
LessWrong · 8h ago
Brackets Are a Bad Way to Regulate
LessWrong · 8h ago
Donating 80% While It Still Counts
LessWrong · 9h ago
Notes on Fourier Analysis
LessWrong · 10h ago
📰
Pope Leo’s First AI Encyclical – Summary and Commentary
LessWrong · 11h ago
📰
Cognitive Security as an AI Safety Cause Area
LessWrong · 16h ago