🧩 Philosophy 21h ago · Logan Riggs

Ambitious Mech Interp w/ Tensor-transformers on toy languages [Project Proposal]

Less Wrong
View Channel →
Ambitious Mech Interp w/ Tensor-transformers on toy languages [Project Proposal]
Source ↗ 👁 0 💬 0
This is my project proposal for Pivotal. Apply as a mentee by May 3rdThe field has accumulated a vocabulary of computational primitives (induction heads, skip-trigrams) through post-hoc analysis. We propose building a toy language from these known primitives to train tensor-transformers (see an early example in the last section)This allows us to study fundamental problems (suppression & error correction, compositionality/ circuits, dev-interp, etc) with the odds stacked in our favor:We know the

Comments (0)

Sign in to join the discussion

More Like This

📰
Types and Tokens
Stanford Encyclopedia of Philosophy · 15h ago
Human-looking robots are a bad idea
LessWrong · 15h ago
📰
How Go Players Disempower Themselves to AI
LessWrong · 17h ago
Early-stage empirical work on “spillway motivations”
LessWrong · 19h ago
Exploration Hacking: Can LLMs Learn to Resist RL Training?
LessWrong · 19h ago
Conditional misalignment: Mitigations can hide EM behind contextual cues
LessWrong · 20h ago