Latest Articles
Next Token Prediction is a Misleading Term
I’m fed up of hearing about how LLMs are next token predictors, and therefore they .There’s lots of philosophical objections, but fundamentally, framing AI as next token predictors in the first places is just misleading and inaccurate. Here’s why LLMs aren’t naive next token predictors.What is Next Token PredictionLet’s first briefly cover what “Next Token Prediction” even means. It is referring to base training (also called pre-training), the first step in training a LLM. We’ll talk about the
0
1
Can ELK be brute-forced? Intertheoretic reduction
Eliciting Latent Knowledge problem for the unfamiliar:Suppose we train a model to predict what the future will look like according to cameras and other sensors. We then use planning algorithms to find a sequence of actions that lead to predicted futures that look good to us.But some action sequences could tamper with the cameras so they show happy humans regardless of what’s really happening. More generally, some futures look great on camera but are actually catastrophically bad.In these cases,
0
1
James C. Scott: Seeing Like a State
Don't get me wrong, but metis is YOLO.
In 1932-33, Soviet collectivization destroyed local farming knowledge and produced a famine that killed somewhere between five and nine million people. It was one of the twentieth century’s great tragedies, and James Scott’s Seeing Like a State draws a straight line from the ideology that caused it — High Modernism, the belief that society can be rationally reorganized from above — to the disaster that followed.But here’s a number that doesn’t appear in Sco
0
2
How to Reason about Your Health Issues
Many people make costly mistakes when reasoning about their health. Even most doctors make this mistake, because it's not a mistake that's caused by a lack of medical knowledge. Rather, it's caused by a lack of clear thinking.
People experience symptoms, and then they look for the root cause of their symptoms. For example, somone with heartburn or pain in their stomach might decide the root cause of their issues is excess stomach acid/GERD (GastroEsophageal Reflux Disease -- a disease affecting
0
1
Falling for the statistical parrot
If it reads confused and stupid, for once it really is part of the intended message I guess.Epistemic status: 0.Sun 2.30am, with Claude having helped me prepare last minute a 4h lecture I had no adequate time for. And after a long week where, as usual, Claude was the one I have been talking to more, for work and other organization, than with my wife, and far more than with anyone else, as happens to a large share of us by now I reckon. Thinking how great it is to be in home office as there are
0
1
On getting unstuck
After more than a year of trials and new models, Anthropic's Claude AI has finally managed to beat Pokémon Red. The writeup that clued me in to this is worth a read; the story of Claude's many failures leading up to its success are frankly hilarious. There's even a catchy song.There was no clear moment when the AI went from stumbling around Mt. Moon or Silph Co. in a haze of frustration to beating the Final Four with ease. Claude just got steadily better at a bunch of things at once—memory, spat
0
1
A relatively brief explanation of Boltzmann Brains
(Initially written for the LW Wiki, but then I realized it was looking more like a post instead.)In 1895, the physicist Ignaz Robert Schütz, who worked as an assistant to the more eminent physicist Ludwig Boltzmann, wondered if our observed universe had simply assembled by a random fluctuation of order from a universe otherwise in thermal equilibrium. The idea was published by Boltzmann in 1896, properly credited to Schütz, and has been associated with Boltzmann ever since.The obvious objection
0
0
Benchmarking Real Work
Thanks to Megan Kinniment for helpful comments and discussion.TL;DR: Benchmarks like HCAST undersample fuzzy (hard to evaluate) tasks, meaning they might overestimate capability on long-horizon work. To sample fuzzy tasks we need to increase judge capacity: we can either try to build automated judges that match human judgment, or reduce the human effort per grade. To do this, we propose generating fuzzy tasks as a byproduct of real SWE work — snapshot the repo and a proto-spec before starting, a
0
0
Trying to use NLAs to find out how Qwen 2.5 7B does multiplication
Neural language autoencoders were just introduced by Anthropic. In a fascinating paper, they showed that you can take the residual stream activations of a language model and then train two instantiations of that same model (an encoder and a decoder) to translate those activations into a natural language verbalisation of them and back. In theory, this is great because it literally lets us have activations explained to us, and we know that it's a faithful explanation because it can literally be tr
0
0
A Year Late, Claude Finally Beats Pokémon
Credit: ClaudePlaysPokemon Elevator Shanty by KurukkooDisclaimer: like some previous posts in this series, this was not primarily written by me, but by a friend. I did substantial editing, however.ClaudePlaysPokemon feat. Opus 4.7 has finally beaten Pokémon Red, fulfilling the challenge set over a year ago when LLMs playing Pokémon went briefly, slightly viral, until Gemini 2.5 Pro suddenly beat Pokémon Blue in May 2025, beating Anthropic at their own challenge by using a stronger harness.Claude
0
0
Asymmetry Between Defensive and Acquisitive Instrumental Deception
Write-up of a recent research sprint looking at factors influencing strategic deception in modelsTL;DRI tested models in a controlled scenario where they could deceptively inflate self-reported performance to influence an upcoming budget decision in their favour. Varying the budget proposal around a baseline lets us measure (a) whether models exhibit an asymmetry between deception to defend against a loss vs. to opportunistically gain advantages, and (b) whether deception rates grow smoothly wit
0
4
Context Modification as a Negative Alignment Tax
Context Rot
Every LLM gets worse as its context grows. Chroma tested 18 frontier models and found performance degradation in all of them, often by double-digit percentages on tasks where short-context performance was strong. The industry calls this "context rot": the gradual degradation of response quality as irrelevant history accumulates in the context window.
The standard fix is compaction: when the context gets too long, summarize it and throw away the original. Claude Code auto-compacts at
0
5
Best Intro AI X-Risk Resource?
I'd like the best short article and video intro explainers, shooting for the 15 minute range.
At least one of the articles shouldn't be on LessWrong, because some will get turned off by this forum.
It should be simple and not require prerequisite knowledge. My parents, and ideally my grandparents, should be able to understand it. Failing that, a normal college student at an average university should be able to; or at least a STEM major.It should have links to more details, in case someone's in
0
6
Sawtooth Problems
Red Button, Blue ButtonOn April 24th, 2026, Tim Urban put forth the following poll on Twitter/X:Everyone in the world has to take a private vote by pressing a red or blue button. If more than 50% of people press the blue button, everyone survives. If less than 50% of people press the blue button, only people who pressed the red button survive. Which button would you press?I love this dilemma, and I'm exhausted by it. I’ve been thinking about it for two straight weeks, and have spent nearly all t
0
6
Control Debt
Notes on the gap: what control evaluations assume implementation in labs.It is 2027, and a frontier lab grew suspicions: plausibly, their model is scheming. Not a surprise for the control team. For more than a year, they worked on a protocol. Trusted monitoring is tested on their benchmark setting, with all agent actions, as well as with suspiciousness-based defer-to-trusted triggers, thresholds from the red-teaming policy, and human escalation in higher risks. In simulation, the safety/useful
0
6
Could Frontier AI Researchers Collectively Slow the Race? A Conditional Pledge Mechanism
OverviewThis is a project proposal and early research on the question of how and whether Frontier AI researchers (not companies themselves) might take on personal risk and pledge to conditionally pause AI development. I am looking for feedback on whether there’s a version of this that researchers might find palatable, and if so what the details might look like. I am especially interested in hearing from people with experience in frontier AI development or doing similar advocacy and outreach work
0
5
The Goblins Are the Paperclips
Last week OpenAI published Where the goblins came from, explaining why their models started slipping creature metaphors into unrelated outputs. The story has been treated as a quirky anecdote: endearing, slightly embarrassing, fixed with a developer-prompt instruction. But I think it deserves a more interesting reading, since the goblin episode is the cleanest evidence we have for the optimization mechanics that paperclip arguments rely on, and the usual objections to those arguments don't engag
0
8
Somerville Porchfest 2026
This afternoon
Cecilia and
I played for
Somerville
Porchfest, with
Harris
calling and Danner running sound. There was rain, but not enough keep
us from playing, or to keep folks from dancing:
We were originally planning to be on Morrison Ave, where we've been for
years.
Two weeks out, though, I learned that it wouldn't be possible to close
Morrison this year. [1] After lots of scrambling, talking to neighbors
and the city, and some help from Lance
Davis, we were able to get per
0
8
The AI Industrial Explosion — Part 2: Transition Dynamics
This is Part 2 of a series on post-AGI economic growth. Part 1 established that a fully automated economy could double roughly every year using current technology. But the US economy does not currently look like a self-reproducing capital machine. It overproduces consumer goods and services relative to maximum growth, and underproduces machinery and raw metals. It cannot instantaneously switch to rapid growth, because it simply does not produce enough of the stuff that makes stuff.
Using the inp
0
4
International Law Cannot Prevent Extinction Either
The context for this post is primarily Only Law Can Prevent Extinction, but after first drafting a half-assed comment, I decided to get off my ass and write a whole-assed post.I agree with Eliezer's main thesis that individual violence against AI researchers is both morally wrong and strategically stupid. Where I disagree is with the claim that international law can prevent extinction. It can't, for the following reasons.I. International law is largely a fiction (especially when interests diverg
0
7
Next Token Prediction is a Misleading Term
I’m fed up of hearing about how LLMs are next token predictors, and therefore they .There’s lots of philosophical obje
0
1
Can ELK be brute-forced? Intertheoretic reduction
Eliciting Latent Knowledge problem for the unfamiliar:Suppose we train a model to predict what the future will look like
0
1
James C. Scott: Seeing Like a State
Don't get me wrong, but metis is YOLO.
In 1932-33, Soviet collectivization destroyed local farming knowledge and produce
0
2
How to Reason about Your Health Issues
Many people make costly mistakes when reasoning about their health. Even most doctors make this mistake, because it's no
0
1
Falling for the statistical parrot
If it reads confused and stupid, for once it really is part of the intended message I guess.Epistemic status: 0.Sun 2.30
0
1
On getting unstuck
After more than a year of trials and new models, Anthropic's Claude AI has finally managed to beat Pokémon Red. The writ
0
1
A relatively brief explanation of Boltzmann Brains
(Initially written for the LW Wiki, but then I realized it was looking more like a post instead.)In 1895, the physicist
0
0
Benchmarking Real Work
Thanks to Megan Kinniment for helpful comments and discussion.TL;DR: Benchmarks like HCAST undersample fuzzy (hard to ev
0
0
Trying to use NLAs to find out how Qwen 2.5 7B does multiplication
Neural language autoencoders were just introduced by Anthropic. In a fascinating paper, they showed that you can take th
0
0
A Year Late, Claude Finally Beats Pokémon
Credit: ClaudePlaysPokemon Elevator Shanty by KurukkooDisclaimer: like some previous posts in this series, this was not
0
0
Asymmetry Between Defensive and Acquisitive Instrumental Deception
Write-up of a recent research sprint looking at factors influencing strategic deception in modelsTL;DRI tested models in
0
4
Context Modification as a Negative Alignment Tax
Context Rot
Every LLM gets worse as its context grows. Chroma tested 18 frontier models and found performance degradatio
0
5
Best Intro AI X-Risk Resource?
I'd like the best short article and video intro explainers, shooting for the 15 minute range.
At least one of the artic
0
6
Sawtooth Problems
Red Button, Blue ButtonOn April 24th, 2026, Tim Urban put forth the following poll on Twitter/X:Everyone in the world ha
0
6
Control Debt
Notes on the gap: what control evaluations assume implementation in labs.It is 2027, and a frontier lab grew suspicions
0
6
Could Frontier AI Researchers Collectively Slow the Race? A Conditional Pledge Mechanism
OverviewThis is a project proposal and early research on the question of how and whether Frontier AI researchers (not co
0
5
The Goblins Are the Paperclips
Last week OpenAI published Where the goblins came from, explaining why their models started slipping creature metaphors
0
8
Somerville Porchfest 2026
This afternoon
Cecilia and
I played for
Somerville
Porchfest, with
Harris
calling and Danner running sound. There
0
8
Next Token Prediction is a Misleading Term
I’m fed up of hearing about how LLMs are next token predictors, and therefore they .There’s lots of philosophical objections, but fundamentally, framing AI as next token predictors in the first places is just misleading and inaccurate. Here’s why LLMs aren’t naive next token predictors.What is Next Token PredictionLet’s first briefly cover what “Next Token Prediction” even means. It is referring to base training (also called pre-training), the first step in training a LLM. We’ll talk about the
0
1 👁
Can ELK be brute-forced? Intertheoretic reduction
Eliciting Latent Knowledge problem for the unfamiliar:Suppose we train a model to predict what the future will look like according to cameras and other sensors. We then use planning algorithms to find a sequence of actions that lead to predicted futures that look good to us.But some action sequences could tamper with the cameras so they show happy humans regardless of what’s really happening. More generally, some futures look great on camera but are actually catastrophically bad.In these cases,
0
1 👁
James C. Scott: Seeing Like a State
Don't get me wrong, but metis is YOLO.
In 1932-33, Soviet collectivization destroyed local farming knowledge and produced a famine that killed somewhere between five and nine million people. It was one of the twentieth century’s great tragedies, and James Scott’s Seeing Like a State draws a straight line from the ideology that caused it — High Modernism, the belief that society can be rationally reorganized from above — to the disaster that followed.But here’s a number that doesn’t appear in Sco
0
2 👁
How to Reason about Your Health Issues
Many people make costly mistakes when reasoning about their health. Even most doctors make this mistake, because it's not a mistake that's caused by a lack of medical knowledge. Rather, it's caused by a lack of clear thinking.
People experience symptoms, and then they look for the root cause of their symptoms. For example, somone with heartburn or pain in their stomach might decide the root cause of their issues is excess stomach acid/GERD (GastroEsophageal Reflux Disease -- a disease affecting
0
1 👁
Falling for the statistical parrot
If it reads confused and stupid, for once it really is part of the intended message I guess.Epistemic status: 0.Sun 2.30am, with Claude having helped me prepare last minute a 4h lecture I had no adequate time for. And after a long week where, as usual, Claude was the one I have been talking to more, for work and other organization, than with my wife, and far more than with anyone else, as happens to a large share of us by now I reckon. Thinking how great it is to be in home office as there are
0
1 👁
On getting unstuck
After more than a year of trials and new models, Anthropic's Claude AI has finally managed to beat Pokémon Red. The writeup that clued me in to this is worth a read; the story of Claude's many failures leading up to its success are frankly hilarious. There's even a catchy song.There was no clear moment when the AI went from stumbling around Mt. Moon or Silph Co. in a haze of frustration to beating the Final Four with ease. Claude just got steadily better at a bunch of things at once—memory, spat
0
1 👁
A relatively brief explanation of Boltzmann Brains
(Initially written for the LW Wiki, but then I realized it was looking more like a post instead.)In 1895, the physicist Ignaz Robert Schütz, who worked as an assistant to the more eminent physicist Ludwig Boltzmann, wondered if our observed universe had simply assembled by a random fluctuation of order from a universe otherwise in thermal equilibrium. The idea was published by Boltzmann in 1896, properly credited to Schütz, and has been associated with Boltzmann ever since.The obvious objection
0
0 👁
Benchmarking Real Work
Thanks to Megan Kinniment for helpful comments and discussion.TL;DR: Benchmarks like HCAST undersample fuzzy (hard to evaluate) tasks, meaning they might overestimate capability on long-horizon work. To sample fuzzy tasks we need to increase judge capacity: we can either try to build automated judges that match human judgment, or reduce the human effort per grade. To do this, we propose generating fuzzy tasks as a byproduct of real SWE work — snapshot the repo and a proto-spec before starting, a
0
0 👁
Trying to use NLAs to find out how Qwen 2.5 7B does multiplication
Neural language autoencoders were just introduced by Anthropic. In a fascinating paper, they showed that you can take the residual stream activations of a language model and then train two instantiations of that same model (an encoder and a decoder) to translate those activations into a natural language verbalisation of them and back. In theory, this is great because it literally lets us have activations explained to us, and we know that it's a faithful explanation because it can literally be tr
0
0 👁
A Year Late, Claude Finally Beats Pokémon
Credit: ClaudePlaysPokemon Elevator Shanty by KurukkooDisclaimer: like some previous posts in this series, this was not primarily written by me, but by a friend. I did substantial editing, however.ClaudePlaysPokemon feat. Opus 4.7 has finally beaten Pokémon Red, fulfilling the challenge set over a year ago when LLMs playing Pokémon went briefly, slightly viral, until Gemini 2.5 Pro suddenly beat Pokémon Blue in May 2025, beating Anthropic at their own challenge by using a stronger harness.Claude
0
0 👁
Asymmetry Between Defensive and Acquisitive Instrumental Deception
Write-up of a recent research sprint looking at factors influencing strategic deception in modelsTL;DRI tested models in a controlled scenario where they could deceptively inflate self-reported performance to influence an upcoming budget decision in their favour. Varying the budget proposal around a baseline lets us measure (a) whether models exhibit an asymmetry between deception to defend against a loss vs. to opportunistically gain advantages, and (b) whether deception rates grow smoothly wit
0
4 👁
Context Modification as a Negative Alignment Tax
Context Rot
Every LLM gets worse as its context grows. Chroma tested 18 frontier models and found performance degradation in all of them, often by double-digit percentages on tasks where short-context performance was strong. The industry calls this "context rot": the gradual degradation of response quality as irrelevant history accumulates in the context window.
The standard fix is compaction: when the context gets too long, summarize it and throw away the original. Claude Code auto-compacts at
0
5 👁
Best Intro AI X-Risk Resource?
I'd like the best short article and video intro explainers, shooting for the 15 minute range.
At least one of the articles shouldn't be on LessWrong, because some will get turned off by this forum.
It should be simple and not require prerequisite knowledge. My parents, and ideally my grandparents, should be able to understand it. Failing that, a normal college student at an average university should be able to; or at least a STEM major.It should have links to more details, in case someone's in
0
6 👁
Sawtooth Problems
Red Button, Blue ButtonOn April 24th, 2026, Tim Urban put forth the following poll on Twitter/X:Everyone in the world has to take a private vote by pressing a red or blue button. If more than 50% of people press the blue button, everyone survives. If less than 50% of people press the blue button, only people who pressed the red button survive. Which button would you press?I love this dilemma, and I'm exhausted by it. I’ve been thinking about it for two straight weeks, and have spent nearly all t
0
6 👁
Control Debt
Notes on the gap: what control evaluations assume implementation in labs.It is 2027, and a frontier lab grew suspicions: plausibly, their model is scheming. Not a surprise for the control team. For more than a year, they worked on a protocol. Trusted monitoring is tested on their benchmark setting, with all agent actions, as well as with suspiciousness-based defer-to-trusted triggers, thresholds from the red-teaming policy, and human escalation in higher risks. In simulation, the safety/useful
0
6 👁
Could Frontier AI Researchers Collectively Slow the Race? A Conditional Pledge Mechanism
OverviewThis is a project proposal and early research on the question of how and whether Frontier AI researchers (not companies themselves) might take on personal risk and pledge to conditionally pause AI development. I am looking for feedback on whether there’s a version of this that researchers might find palatable, and if so what the details might look like. I am especially interested in hearing from people with experience in frontier AI development or doing similar advocacy and outreach work
0
5 👁
The Goblins Are the Paperclips
Last week OpenAI published Where the goblins came from, explaining why their models started slipping creature metaphors into unrelated outputs. The story has been treated as a quirky anecdote: endearing, slightly embarrassing, fixed with a developer-prompt instruction. But I think it deserves a more interesting reading, since the goblin episode is the cleanest evidence we have for the optimization mechanics that paperclip arguments rely on, and the usual objections to those arguments don't engag
0
8 👁
Somerville Porchfest 2026
This afternoon
Cecilia and
I played for
Somerville
Porchfest, with
Harris
calling and Danner running sound. There was rain, but not enough keep
us from playing, or to keep folks from dancing:
We were originally planning to be on Morrison Ave, where we've been for
years.
Two weeks out, though, I learned that it wouldn't be possible to close
Morrison this year. [1] After lots of scrambling, talking to neighbors
and the city, and some help from Lance
Davis, we were able to get per
0
8 👁
The AI Industrial Explosion — Part 2: Transition Dynamics
This is Part 2 of a series on post-AGI economic growth. Part 1 established that a fully automated economy could double roughly every year using current technology. But the US economy does not currently look like a self-reproducing capital machine. It overproduces consumer goods and services relative to maximum growth, and underproduces machinery and raw metals. It cannot instantaneously switch to rapid growth, because it simply does not produce enough of the stuff that makes stuff.
Using the inp
0
4 👁
International Law Cannot Prevent Extinction Either
The context for this post is primarily Only Law Can Prevent Extinction, but after first drafting a half-assed comment, I decided to get off my ass and write a whole-assed post.I agree with Eliezer's main thesis that individual violence against AI researchers is both morally wrong and strategically stupid. Where I disagree is with the claim that international law can prevent extinction. It can't, for the following reasons.I. International law is largely a fiction (especially when interests diverg
0
7 👁
Next Token Prediction is a Misleading Term
I’m fed up of hearing about how LLMs are next token predictors, and therefore they .There’s lots of philosophical objections, bu…
💬 0
👁 1
Can ELK be brute-forced? Intertheoretic reduction
LessWrong · May 17, 2026
💬 0
👁 1
James C. Scott: Seeing Like a State
LessWrong · May 17, 2026
💬 0
👁 2
How to Reason about Your Health Issues
LessWrong · May 17, 2026
💬 0
👁 1
Falling for the statistical parrot
LessWrong · May 17, 2026
On getting unstuck
LessWrong · May 17, 2026
A relatively brief explanation of Boltzmann Brains
LessWrong · May 16, 2026
Benchmarking Real Work
LessWrong · May 16, 2026
Trying to use NLAs to find out how Qwen 2.5 7B does multiplication
Neural language autoencoders were just introduced by Anthropic. In a fascinating paper, they showed that you can take the residual…
💬 0
👁 0
A Year Late, Claude Finally Beats Pokémon
LessWrong · May 16, 2026
💬 0
👁 0
Asymmetry Between Defensive and Acquisitive Instrumental Deception
LessWrong · May 10, 2026
💬 0
👁 4
Context Modification as a Negative Alignment Tax
LessWrong · May 10, 2026
💬 0
👁 5

Best Intro AI X-Risk Resource?
LessWrong · May 10, 2026
Sawtooth Problems
LessWrong · May 10, 2026
Control Debt
LessWrong · May 10, 2026
Could Frontier AI Researchers Collectively Slow the Race? A Conditional Pledge Mechanism
LessWrong · May 10, 2026
The Goblins Are the Paperclips
Last week OpenAI published Where the goblins came from, explaining why their models started slipping creature metaphors into unrel…
💬 0
👁 8