🧩 Philosophy 4h ago · Lun

Trained steering vectors may work as activation oracles

Less Wrong
View Channel →
Trained steering vectors may work as activation oracles
Source ↗ 👁 0 💬 0
Inspired by @Eriskii's recent finding that trained steering vectors can teach a base model to act as an assistant, I replaced the Activation Oracle paper's trained LoRA with a far smaller set of per layer trained steering vectors and found surprisingly good eval results, far better than anticipated from the tiny param count.Trained per-layer steering vectors on Qwen3-8B as an activation oracleStandard activation injection mechanism with " ?" placeholdersCollected activation ranges (full sequence

Comments (0)

Sign in to join the discussion

More Like This

📰
Claude the romance novelist
LessWrong · 1h ago
(Re)introduction of a rationalist dragon, and clarifications on Ziz's character
LessWrong · 3h ago
📰
Community misconduct disputes are not about facts
LessWrong · 4h ago
Opus 4.7 Part 3: Model Welfare
LessWrong · 5h ago
Shots Fired in the Third War of Priors
LessWrong · 7h ago
📰
Why no new notations since 1960?
LessWrong · 8h ago