🧩 Philosophy 2d ago · ChristopherT

Research note on selective inoculation

Less Wrong
View Channel →
Research note on selective inoculation
Source ↗ 👁 0 💬 0
IntroductionInoculation Prompting is a technique to improve test-time alignment by introducing a contextual cue (like a system prompt) to steer the model behavior away from unwanted traits at inference time. Prior inoculation prompting works apply the inoculation prompt globally to every training example during SFT or RL, primarily in settings where the undesired behavior is present in all examples. This raise two main concerns including impacts towards learned positive traits and also the fact

Comments (0)

Sign in to join the discussion

More Like This

Reflections on the largest AI safety protest in US history
LessWrong · 1d ago
📰
Defending Habit Streaks
LessWrong · 1d ago
📰
Estimates of the expected utility gain of AI Safety Research
LessWrong · 1d ago
📰
The slow death of the accelerationist.
LessWrong · 1d ago
New Fatebook Android App
LessWrong · 1d ago
My forays into cyborgism: theory, pt. 1
LessWrong · 1d ago