🧩 Philosophy 12h ago · Nathan Helm-Burger

Research Log: Monet/PEER sparse experts

Less Wrong

Source ↗ 👁 0 💬 0

I've been looking into the Monet/PEER sparse expert papers. I think there's a lot of potential in these ideas for interpretability-by-design.
Some of what I've done so far:

Quantization experiments: PEER can be losslessly distilled to int8 and distilled to int4 with only minor degradation. From int4, you can train PEER by having second int4 tensor that works as a gradient accumulation buffer (allowing for incremental steps between two int4 values), with some stochastic rounding on the accumul