🧩 Philosophy 15h ago · deep

Decision theory doesn’t prove that useful strong AIs will doom us all

Less Wrong
View Channel →
Decision theory doesn’t prove that useful strong AIs will doom us all
Source ↗ 👁 0 💬 0
Bottom-line up frontTraining for optimal behavior doesn't inevitably lead to act-utilitarian world optimizers ("WorldSUM agents"). People will prefer to deploy agents with more virtue-ethicsy / deontological approaches, for 2-3 reasons: 1) Traditional misalignment concerns2) Even if they have "the right values", we don't trust them to get the calculation right -- just like human subordinates.Similarly, many people including AI labs will prefer agents whose action space is bounded, because they

Comments (0)

Sign in to join the discussion

More Like This

📰
Idealism
Stanford Encyclopedia of Philosophy · 10h ago
Toward a Better Evaluations Ecosystem
LessWrong · 13h ago
Model Spec Midtraining: Improving How Alignment Training Generalizes
LessWrong · 13h ago
📰
Positive Feedback Only
LessWrong · 14h ago
📰
What if LLMs are mostly crystallized intelligence?
LessWrong · 15h ago
Psychopathy: The Mechanics
LessWrong · 15h ago