Latest Articles
Reflections on the largest AI safety protest in US history
On a sunny Saturday afternoon two weeks ago, I was sitting in Dolores park, watching a man get turned into a cake. It was, I gather, his birthday and for reasons (Maybe something to do with Scandanavia?) his friends had decided to celebrate by taping him to a tree and dousing him with all manner of liquids and powders. At the end, confetti flew everywhere. It was hard not to notice, and hard not to watch.Something about the vibe was inspiring… I felt like maybe we should be doing something li
0
0
Defending Habit Streaks
I have a lot of habit streaks. Some of the streaks I have going at the moment:
Studied Anki cards for Chinese every day for 8 months*
Meditated every day for the past 1.5 years*
Flossed every day for 6+ months*
In fact I think quite a lot of my identity is connected to these streaks at this point, and that’s part of what sustains them
[1]
. But there are a lot of other things you can do to make habits and their associated streaks more sustainable.
It’s helpful if they are small enoug
0
0
Estimates of the expected utility gain of AI Safety Research
When thinking about AI risk, I often wonder how materially impactful each hour of my time is, and I think that this may be useful for other people to know as well, so I spent a couple of hours making a couple of estimates. I basically expect that a tonne of people have put a bunch more time into this than me, but this is nice to have as a rough sketch to point people to.I'm going to make 3 estimates: an underestimate, my best-guess estimate and (what I think is) an overestimate.Starting facts[1]
0
0
The slow death of the accelerationist.
The year is 2024. Summer has just begun. National discourse, for now, is solely focused on the upcoming presidential election, with many a journalist or political commentator critiquing the current, rather fiery state of political affairs. Tech and its associated public commentary has centered upon artificial intelligence as its new darling, hailing OpenAI as a savior for what was once deemed an idea stuck in science fiction, and looking to burgeoning startups such as Cursor and Windsurf as ear
0
0
New Fatebook Android App
tldr; get the new Fatebook Android app!What is Fatebook?Fatebook.io is a website[1] for easily tracking your predictions and becoming better calibrated at them. I like it a lot, and find it convenient for practicing probabilistic thinking.The Fatebook.io dashboardThat said, I've found Fatebook's mobile version to be clunky, and its email-based notifications to be less-than-ideal...which leads me to:The New Android AppOver the past two weeks, I've made an android app that wraps the Fatebook API,
0
0
My forays into cyborgism: theory, pt. 1
In this post, I share the thinking that lies behind the Exobrain system I have built for myself. In another post, I'll describe the actual system.I think the standard way of relating to LLM/AIs is as an external tool (or "digital mind") that you use and/or collaborate with. Instead of you doing the coding, you ask the LLM to do it for you. Instead of doing the research, you ask it to. That's great, and there is utility in those use cases.Now, while I hardly engage in the delusion that humans can
0
0
Unmathematical features of math
(Epistemic status: I consider the following quite obvious and self-evident, but decided to post anyways.[1])Mathematics is a social activity done by mathematicians.— Paul Erdős, probablyThere've been a few attempts to create mathematical models of math. The examples that come to my mind are Gödelian Numbering (GN) and Logical Induction (LI). Feel free to suggest more in the comments, but I'll use those as my primary reference points. In this post, I want to contrast them with the way human mathe
0
0
Is that uncertainty in your pocket or are you just happy to be here?
Hi, I'm kromem, and this is my 5th annual Easter 'shitpost' as part of a larger multi-year cross-media project inspired by 42 Entertainment, and built around a central premise: Truth clusters and fictions fractalize.(It's been a bit of a hare-brained idea continuing to gestate from the first post on a hypothetical Easter egg in a simulation. While this piece fits in with the larger koine of material, it can also be read on its own, so if you haven't been following along down the rabbit hole, no
0
0
Unsweetened Whipped Cream
I'm a huge fan of whipped cream. It's rich, smooth, and fluffy, which
makes it a great contrast to a wide range of textures common in baked
goods. And it's usually better without adding sugar.
Desserts are usually too sweet. I want them to have enough sugar that
they feel like a dessert, but it's common to have way more than that.
Some of this is functional: in most cakes the sugar performs a
specific role in the structure, where if you cut the sugar the
texture will be much worse. This
0
0
11 pieces of advice for children
I came up with these principles when I was a child myself.Don’t be a sheep 🐑. Avoid mindlessly copying others. Resist the urge towards conformity. Think for yourself whether something is worth doing and useful for your goals. If appearing to conform is useful for your goals, think about ways to do the bare minimum. Others are making very many mistakes you don’t want to make, and things can be done much better and more effectively than most people do them. (Be extra aware of this point if you are
0
0
I Made Parseltongue
Yes, that one from HPMoR by @Eliezer Yudkowsky. And I mean it absolutely literally - this is a language designed to make lies inexpressible. It catches LLMs' ungrounded statements, incoherent logic and hallucinations. Comes with notebooks (Jupyter-style), server for use with agents, and inspection tooling. Github, Documentation. Works everywhere - even in the web Claude with the code execution sandbox.
How
Unsophisticated lies and manipulations are typically ungrounded or include logical inconsi
0
0
Steering Might Stop Working Soon
Steering LLMs with single-vector methods might break down soon, and by soon I mean soon enough that if you're working on steering, you should start planning for it failing now.This is particularly important for things like steering as a mitigation against eval-awareness. Steering HumansI have a strong intuition that we will not be able to steer a superintelligence very effectively, partially for the same reason that you probably can't steer a human very effectively. I think weakly "steering" a h
0
0
What I like about MATS and Research Management
Crossposted on my personal blog. This is post number 16 in my second attempt at doing Inkkaven in a day, i.e. to write 30 blogposts in a single day.MATS is an organization that pairs up-and-coming AI Safety researchers (who I call participants) with the world’s best (this is not an exaggeration) existing AI Safety researchers (called mentors), for a minimum of 3 months research experience, followed by 6 or 12 months of further time to pursue their research further if they meet a minimum standard
0
0
Thoughts on Practical Ethics
DisclaimersThis essay is me trying to figure out the “edges” of Singer’s argument in Practical Ethics.I’ve written and rewritten it several times, and it bothers me that I don’t reach a particular conclusion. The essay itself remains at the level of “musings” instead of “worked out, internally consistent philosophical refutation”.Nevertheless, I want to share my thoughts, so publishing it anyway.Some specific disclaimers:I agree with many Singer’s conclusions.This essay is based on my extension
0
1
How much faster is speaking, compared to typing on laptop vs phone vs writing?
So as I haven’t been able to speak the past short while, one thing I have noticed is that it is harder to communicate with others. I know what you are thinking: “Wow, who could have possibly guessed? It’s harder to converse when you can’t speak?”. Indeed, I didn’t expect it either.But how much harder is it to communicate?One proxy you can use is the classic typing metric, words per minute (wpm). So I spend some time looking at various forms of communication and how they differ between one anothe
0
0
Academic Proof-of-Work in the Age of LLMs
Written quickly as part of the Inkhaven Residency.Related: Bureaucracy as active ingredient, pain as active ingredientA widely known secret in academia is that many of the formalities serve in large part proof of work. That is, the reason expensive procedures exist is that some way of filtering must exist, and the amount of effort invested can often be a good proxy for the quality of the work. Specifically, the pool of research is vast, and good research can often be hard to identify. Even engag
0
0
Ten different ways of thinking about Gradual Disempowerment
About a year ago, we wrote a paper that coined the term “Gradual Disempowerment.”It proved to be a great success, which is terrific. A friend and colleague told me that it was the most discussed paper at DeepMind last year (selection bias, grain of salt, etc.) It spawned articles in the Economist and the Guardian. Most importantly, it entered the lexicon. It’s not commonplace for people in AI safety circles and even outside of them to use the term, often in contrast with misalignment or rogu
0
0
Cheaper/faster/easier makes for step changes (and that's why even current-level LLMs are transformative)
We already knew there's nothing new under the sun. Thanks to advances in telescopes, orbital launch, satellites, and space vehicles we now know there's nothing new above the sun either, but there is rather a lot of energy!For many phenomena, I think it's a matter of convenience and utility where you model them as discrete or continuous, aka, qualitative vs quantitative. On one level, nukes are simply a bigger explosion, and we already had explosions. On another level, they're sufficiently bigger
0
0
Positive sum doesn't mean "win-win"
A lot of people and documents online say that positive-sum games are "win-wins", where all of the participants are better off. But this isn't true! If A gets $5 and B gets -$2 that's positive sum (the sum is $3) but it's not a win-win (B lost). Positive sum games can be win-wins, but they aren't necessarily games where everybody benefits. I think people tend to over-generalize from the most common case of a win-win.E.g. some of the claims you see when reading about positive-sum games online:A po
0
0
Research note on selective inoculation
IntroductionInoculation Prompting is a technique to improve test-time alignment by introducing a contextual cue (like a system prompt) to steer the model behavior away from unwanted traits at inference time. Prior inoculation prompting works apply the inoculation prompt globally to every training example during SFT or RL, primarily in settings where the undesired behavior is present in all examples. This raise two main concerns including impacts towards learned positive traits and also the fact
0
0
Is that uncertainty in your pocket or are you just happy to be here?
0
0
How much faster is speaking, compared to typing on laptop vs phone vs writing?
0
0
Reflections on the largest AI safety protest in US history
On a sunny Saturday afternoon two weeks ago, I was sitting in Dolores park, watching a man get turned into a cake. It was, I gather, his birthday and for reasons (Maybe something to do with Scandanavia?) his friends had decided to celebrate by taping him to a tree and dousing him with all manner of liquids and powders. At the end, confetti flew everywhere. It was hard not to notice, and hard not to watch.Something about the vibe was inspiring… I felt like maybe we should be doing something li
0
0 👁
Defending Habit Streaks
I have a lot of habit streaks. Some of the streaks I have going at the moment:
Studied Anki cards for Chinese every day for 8 months*
Meditated every day for the past 1.5 years*
Flossed every day for 6+ months*
In fact I think quite a lot of my identity is connected to these streaks at this point, and that’s part of what sustains them
[1]
. But there are a lot of other things you can do to make habits and their associated streaks more sustainable.
It’s helpful if they are small enoug
0
0 👁
Estimates of the expected utility gain of AI Safety Research
When thinking about AI risk, I often wonder how materially impactful each hour of my time is, and I think that this may be useful for other people to know as well, so I spent a couple of hours making a couple of estimates. I basically expect that a tonne of people have put a bunch more time into this than me, but this is nice to have as a rough sketch to point people to.I'm going to make 3 estimates: an underestimate, my best-guess estimate and (what I think is) an overestimate.Starting facts[1]
0
0 👁
The slow death of the accelerationist.
The year is 2024. Summer has just begun. National discourse, for now, is solely focused on the upcoming presidential election, with many a journalist or political commentator critiquing the current, rather fiery state of political affairs. Tech and its associated public commentary has centered upon artificial intelligence as its new darling, hailing OpenAI as a savior for what was once deemed an idea stuck in science fiction, and looking to burgeoning startups such as Cursor and Windsurf as ear
0
0 👁
New Fatebook Android App
tldr; get the new Fatebook Android app!What is Fatebook?Fatebook.io is a website[1] for easily tracking your predictions and becoming better calibrated at them. I like it a lot, and find it convenient for practicing probabilistic thinking.The Fatebook.io dashboardThat said, I've found Fatebook's mobile version to be clunky, and its email-based notifications to be less-than-ideal...which leads me to:The New Android AppOver the past two weeks, I've made an android app that wraps the Fatebook API,
0
0 👁
My forays into cyborgism: theory, pt. 1
In this post, I share the thinking that lies behind the Exobrain system I have built for myself. In another post, I'll describe the actual system.I think the standard way of relating to LLM/AIs is as an external tool (or "digital mind") that you use and/or collaborate with. Instead of you doing the coding, you ask the LLM to do it for you. Instead of doing the research, you ask it to. That's great, and there is utility in those use cases.Now, while I hardly engage in the delusion that humans can
0
0 👁
Unmathematical features of math
(Epistemic status: I consider the following quite obvious and self-evident, but decided to post anyways.[1])Mathematics is a social activity done by mathematicians.— Paul Erdős, probablyThere've been a few attempts to create mathematical models of math. The examples that come to my mind are Gödelian Numbering (GN) and Logical Induction (LI). Feel free to suggest more in the comments, but I'll use those as my primary reference points. In this post, I want to contrast them with the way human mathe
0
0 👁
Is that uncertainty in your pocket or are you just happy to be here?
Hi, I'm kromem, and this is my 5th annual Easter 'shitpost' as part of a larger multi-year cross-media project inspired by 42 Entertainment, and built around a central premise: Truth clusters and fictions fractalize.(It's been a bit of a hare-brained idea continuing to gestate from the first post on a hypothetical Easter egg in a simulation. While this piece fits in with the larger koine of material, it can also be read on its own, so if you haven't been following along down the rabbit hole, no
0
0 👁
Unsweetened Whipped Cream
I'm a huge fan of whipped cream. It's rich, smooth, and fluffy, which
makes it a great contrast to a wide range of textures common in baked
goods. And it's usually better without adding sugar.
Desserts are usually too sweet. I want them to have enough sugar that
they feel like a dessert, but it's common to have way more than that.
Some of this is functional: in most cakes the sugar performs a
specific role in the structure, where if you cut the sugar the
texture will be much worse. This
0
0 👁
11 pieces of advice for children
I came up with these principles when I was a child myself.Don’t be a sheep 🐑. Avoid mindlessly copying others. Resist the urge towards conformity. Think for yourself whether something is worth doing and useful for your goals. If appearing to conform is useful for your goals, think about ways to do the bare minimum. Others are making very many mistakes you don’t want to make, and things can be done much better and more effectively than most people do them. (Be extra aware of this point if you are
0
0 👁
I Made Parseltongue
Yes, that one from HPMoR by @Eliezer Yudkowsky. And I mean it absolutely literally - this is a language designed to make lies inexpressible. It catches LLMs' ungrounded statements, incoherent logic and hallucinations. Comes with notebooks (Jupyter-style), server for use with agents, and inspection tooling. Github, Documentation. Works everywhere - even in the web Claude with the code execution sandbox.
How
Unsophisticated lies and manipulations are typically ungrounded or include logical inconsi
0
0 👁
Steering Might Stop Working Soon
Steering LLMs with single-vector methods might break down soon, and by soon I mean soon enough that if you're working on steering, you should start planning for it failing now.This is particularly important for things like steering as a mitigation against eval-awareness. Steering HumansI have a strong intuition that we will not be able to steer a superintelligence very effectively, partially for the same reason that you probably can't steer a human very effectively. I think weakly "steering" a h
0
0 👁
What I like about MATS and Research Management
Crossposted on my personal blog. This is post number 16 in my second attempt at doing Inkkaven in a day, i.e. to write 30 blogposts in a single day.MATS is an organization that pairs up-and-coming AI Safety researchers (who I call participants) with the world’s best (this is not an exaggeration) existing AI Safety researchers (called mentors), for a minimum of 3 months research experience, followed by 6 or 12 months of further time to pursue their research further if they meet a minimum standard
0
0 👁
Thoughts on Practical Ethics
DisclaimersThis essay is me trying to figure out the “edges” of Singer’s argument in Practical Ethics.I’ve written and rewritten it several times, and it bothers me that I don’t reach a particular conclusion. The essay itself remains at the level of “musings” instead of “worked out, internally consistent philosophical refutation”.Nevertheless, I want to share my thoughts, so publishing it anyway.Some specific disclaimers:I agree with many Singer’s conclusions.This essay is based on my extension
0
1 👁
How much faster is speaking, compared to typing on laptop vs phone vs writing?
So as I haven’t been able to speak the past short while, one thing I have noticed is that it is harder to communicate with others. I know what you are thinking: “Wow, who could have possibly guessed? It’s harder to converse when you can’t speak?”. Indeed, I didn’t expect it either.But how much harder is it to communicate?One proxy you can use is the classic typing metric, words per minute (wpm). So I spend some time looking at various forms of communication and how they differ between one anothe
0
0 👁
Academic Proof-of-Work in the Age of LLMs
Written quickly as part of the Inkhaven Residency.Related: Bureaucracy as active ingredient, pain as active ingredientA widely known secret in academia is that many of the formalities serve in large part proof of work. That is, the reason expensive procedures exist is that some way of filtering must exist, and the amount of effort invested can often be a good proxy for the quality of the work. Specifically, the pool of research is vast, and good research can often be hard to identify. Even engag
0
0 👁
Ten different ways of thinking about Gradual Disempowerment
About a year ago, we wrote a paper that coined the term “Gradual Disempowerment.”It proved to be a great success, which is terrific. A friend and colleague told me that it was the most discussed paper at DeepMind last year (selection bias, grain of salt, etc.) It spawned articles in the Economist and the Guardian. Most importantly, it entered the lexicon. It’s not commonplace for people in AI safety circles and even outside of them to use the term, often in contrast with misalignment or rogu
0
0 👁
Cheaper/faster/easier makes for step changes (and that's why even current-level LLMs are transformative)
We already knew there's nothing new under the sun. Thanks to advances in telescopes, orbital launch, satellites, and space vehicles we now know there's nothing new above the sun either, but there is rather a lot of energy!For many phenomena, I think it's a matter of convenience and utility where you model them as discrete or continuous, aka, qualitative vs quantitative. On one level, nukes are simply a bigger explosion, and we already had explosions. On another level, they're sufficiently bigger
0
0 👁
Positive sum doesn't mean "win-win"
A lot of people and documents online say that positive-sum games are "win-wins", where all of the participants are better off. But this isn't true! If A gets $5 and B gets -$2 that's positive sum (the sum is $3) but it's not a win-win (B lost). Positive sum games can be win-wins, but they aren't necessarily games where everybody benefits. I think people tend to over-generalize from the most common case of a win-win.E.g. some of the claims you see when reading about positive-sum games online:A po
0
0 👁
Research note on selective inoculation
IntroductionInoculation Prompting is a technique to improve test-time alignment by introducing a contextual cue (like a system prompt) to steer the model behavior away from unwanted traits at inference time. Prior inoculation prompting works apply the inoculation prompt globally to every training example during SFT or RL, primarily in settings where the undesired behavior is present in all examples. This raise two main concerns including impacts towards learned positive traits and also the fact
0
0 👁
Reflections on the largest AI safety protest in US history
On a sunny Saturday afternoon two weeks ago, I was sitting in Dolores park, watching a man get turned into a cake. It was, I gath…
💬 0
👁 0
Defending Habit Streaks
LessWrong · 1d ago
💬 0
👁 0
Estimates of the expected utility gain of AI Safety Research
LessWrong · 1d ago
💬 0
👁 0
The slow death of the accelerationist.
LessWrong · 1d ago
💬 0
👁 0

New Fatebook Android App
LessWrong · 1d ago

My forays into cyborgism: theory, pt. 1
LessWrong · 1d ago
Unmathematical features of math
LessWrong · 1d ago

Is that uncertainty in your pocket or are you just happy to be here?
LessWrong · 1d ago
Unsweetened Whipped Cream
I'm a huge fan of whipped cream. It's rich, smooth, and fluffy, which
makes it a great contrast to a wide range of textures commo…
💬 0
👁 0
11 pieces of advice for children
LessWrong · 1d ago
💬 0
👁 0
I Made Parseltongue
LessWrong · 1d ago
💬 0
👁 0
Steering Might Stop Working Soon
LessWrong · 1d ago
💬 0
👁 0
What I like about MATS and Research Management
LessWrong · 1d ago
Thoughts on Practical Ethics
LessWrong · 1d ago
How much faster is speaking, compared to typing on laptop vs phone vs writing?
LessWrong · 1d ago
Academic Proof-of-Work in the Age of LLMs
LessWrong · 1d ago
Ten different ways of thinking about Gradual Disempowerment
About a year ago, we wrote a paper that coined the term “Gradual Disempowerment.”It proved to be a great success, which is terrifi…
💬 0
👁 0