🧩 Philosophy 7h ago · Brendan Long

Training a Transformer to Compose One Step Per Layer (and Proving It)

Less Wrong
View Channel →
Source ↗ 👁 0 💬 0
I'm working on an experiment comparing the internal representations of two architectures when solving a sequential algorithm, but training models to use a sequential algorithm is surprisingly hard. The optimization landscape makes it easier for models to learn parallel algorithms or memorize lookup tables, so I needed to make some specific architectural and training decisions to get models to actually learn the sequential algorithm. Even with all of these tricks, the results are seed-dependent a

Comments (0)

Sign in to join the discussion

More Like This

📰
How does Reinforcement Learning Affect Models
LessWrong · 2h ago
📰
The Case For Universalism
LessWrong · 2h ago
Emergent misalignment evident in activations at low poisoning doses - long before behavioral checks flag it
LessWrong · 6h ago
📰
Massapequa ACX Meetup
LessWrong · 6h ago
📰
Retrospective on my unsupervised elicitation challenge
LessWrong · 7h ago
Alignment Faking Replication and Chain-of-Thought Monitoring Extensions
LessWrong · 7h ago