Computation in Superposition: Two Handcrafted Models
Source ↗
👁 0
💬 0
Many interpretability researchers (ourselves included) believe that neural networks store knowledge in superposition—that is, networks encode more facts than they have individual components. A natural extension of this idea is that networks also perform computation on knowledge that lives in superposition. Despite the centrality of this concept, there are few concrete examples of what computation in superposition actually looks like in practice.In this post, we study a toy memorization task wher
Comments (0)