Research Log: Monet/PEER sparse experts
Source ↗
👁 0
💬 0
I've been looking into the Monet/PEER sparse expert papers. I think there's a lot of potential in these ideas for interpretability-by-design.
Some of what I've done so far:
Quantization experiments: PEER can be losslessly distilled to int8 and distilled to int4 with only minor degradation. From int4, you can train PEER by having second int4 tensor that works as a gradient accumulation buffer (allowing for incremental steps between two int4 values), with some stochastic rounding on the accumul
Some of what I've done so far:
Quantization experiments: PEER can be losslessly distilled to int8 and distilled to int4 with only minor degradation. From int4, you can train PEER by having second int4 tensor that works as a gradient accumulation buffer (allowing for incremental steps between two int4 values), with some stochastic rounding on the accumul
Comments (0)