🧩 Philosophy 2d ago · ChristopherT

Research note on selective inoculation

Less Wrong

Source ↗ 👁 0 💬 0

IntroductionInoculation Prompting is a technique to improve test-time alignment by introducing a contextual cue (like a system prompt) to steer the model behavior away from unwanted traits at inference time. Prior inoculation prompting works apply the inoculation prompt globally to every training example during SFT or RL, primarily in settings where the undesired behavior is present in all examples. This raise two main concerns including impacts towards learned positive traits and also the fact