AI Alignment is Not Enough to Make the Future Go Well

Stanford Existential Risks Conference, 2023

12 Pages Posted: 30 Jan 2024

Date Written: April 2, 2023

Abstract

AI alignment is commonly explained as aligning advanced AI systems with human values. Especially when combined with the idea that AI systems aim to optimize their world based on their goals, this has led to the belief that solving the problem of AI alignment will pave the way for an excellent future. However, this common definition of AI alignment is somewhat idealistic and misleading, as the majority of alignment research for cutting-edge systems is focused on aligning AI with task preferences (training AIs to solve user-provided tasks in a helpful manner), as well as reducing the risk that the AI would have the goal of causing catastrophe.

We can conceptualize three different targets of alignment: alignment to task preferences, human values, or idealized values.

Extrapolating from the deployment of advanced systems such as GPT-4 and from studying economic incentives, we can expect AIs aligned with task preferences to be the dominant form of aligned AIs by default.

Aligning AI to task preferences will not by itself solve major problems for the long-term future. Among other problems, these include moral progress, existential security, wild animal suffering, the well-being of digital minds, risks of catastrophic conflict, and optimizing for ideal values. Additional efforts are necessary to motivate society to have the capacity and will to solve these problems.

Keywords: AI alignment

Suggested Citation

Chen, Michael, AI Alignment is Not Enough to Make the Future Go Well (April 2, 2023). Stanford Existential Risks Conference, 2023, Available at SSRN: https://ssrn.com/abstract=4684068 or http://dx.doi.org/10.2139/ssrn.4684068

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
46
Abstract Views
253
PlumX Metrics