🧩 Philosophy 4h ago · Lun

Trained steering vectors may work as activation oracles

Less Wrong

Trained steering vectors may work as activation oracles

Source ↗ 👁 0 💬 0

Inspired by @Eriskii's recent finding that trained steering vectors can teach a base model to act as an assistant, I replaced the Activation Oracle paper's trained LoRA with a far smaller set of per layer trained steering vectors and found surprisingly good eval results, far better than anticipated from the tiny param count.Trained per-layer steering vectors on Qwen3-8B as an activation oracleStandard activation injection mechanism with " ?" placeholdersCollected activation ranges (full sequence