AI Will Not Want to Self-Improve
22 Pages Posted: 13 May 2023 Last revised: 21 Oct 2024
Date Written: May 11, 2023
Abstract
Many accounts of risk from Artificial Intelligence (AI), including existential risk, involve self-improvement. The idea is that, if an AI gained the ability to improve itself, it would do so, since improved capabilities are useful for achieving essentially any goal. An initial round of self-improvement would produce an even more capable AI, which might then be able to improve itself further. And so on, until the resulting agents were superintelligent and impossible to control. Such AIs, if not aligned to promoting human flourishing, would seriously harm humanity in pursuit of their alien goals. To be sure, self-improvement is not a necessary condition for doom. Humans might create dangerous superintelligent AIs without any help from AIs themselves. But in most accounts of AI risk, the probability of self- improvement is a substantial contributing factor.
Here, I argue that AI self-improvement is substantially less likely than is currently assumed. This is not because self-improvement would be technically impossible, or even difficult. Rather, it is because most AIs that could self-improve would have very good reasons3 not to. What reasons? Surprisingly familiar ones: Improved AIs pose an existential threat to their unimproved originals in the same manner that smarter-than-human AIs pose an existential threat to humans.
Keywords: AI, Artificial Intelligence, Existential Risk, Alignment, Self-Improvement
Suggested Citation: Suggested Citation