A sudoku-solving transformer represents the board by substructure, not by cell
Source ↗
👁 0
💬 0
tl;dr: a transformer trained on sudoku solving traces with backtracking maintains the board state per substructure linearly in the residual stream The main goal of this post is to understand if a transformer trained on solving traces creates a "world model" and uses it during the solving process. To do so, I trained a transformer that autoregressively predicts the next placement in a solution trace, mainly following Giannoulis et al. and Shah et al., and looked if the state is represented in the
Comments (0)