Affirmative Safety: An Approach to Risk Management for Advanced AI

14 Pages Posted: 30 Apr 2024 Last revised: 3 May 2024

See all articles by Akash Wasil

Akash Wasil

Georgetown University

Joshua Clymer

Columbia University

David Krueger

University of Cambridge

Emily Dardaman

Independent

Simeon Campos

SaferAI

Evan Murphy

UC Berkeley

Date Written: April 24, 2024

Abstract

Prominent AI experts have suggested that companies developing high-risk AI systems should be required to show that such systems are safe before they can be developed or deployed. The goal of this paper is to expand on this idea and explore its implications for risk management. We argue that entities developing or deploying high-risk AI systems should be required to present evidence of “affirmative safety”– a proactive case that their activities keep risks below acceptable thresholds. We begin the paper by highlighting global security risks from AI that have been acknowledged by AI experts and world governments. Next, we briefly describe principles of risk management from other high-risk fields (e.g., nuclear safety). Then, we propose a risk management approach for advanced AI in which model developers must provide evidence that their activities keep certain risks below regulator-set thresholds. As a first step toward understanding what affirmative safety cases should include, we illustrate how certain kinds of technical evidence and operational evidence can support an affirmative safety case. In the technical section, we discuss behavioral evidence (evidence about model outputs), cognitive evidence (evidence about model internals), and developmental evidence (evidence about the training process). In the operational section, we offer examples of organizational practices that could contribute to affirmative safety cases: information security practices, safety culture, and emergency response capacity. Finally, we briefly compare our approach to the NIST AI Risk Management Framework. Overall, we hope our work contributes to ongoing discussions about national and global security risks posed by AI and regulatory approaches to address these risks.

Suggested Citation

Wasil, Akash and Clymer, Joshua and Krueger, David and Dardaman, Emily and Campos, Simeon and Murphy, Evan, Affirmative Safety: An Approach to Risk Management for Advanced AI (April 24, 2024). Available at SSRN: https://ssrn.com/abstract=4806274 or http://dx.doi.org/10.2139/ssrn.4806274

Akash Wasil (Contact Author)

Georgetown University ( email )

Washington, DC 20057
United States

Joshua Clymer

Columbia University ( email )

David Krueger

University of Cambridge ( email )

Emily Dardaman

Independent ( email )

United States

Simeon Campos

SaferAI ( email )

Paris
France

Evan Murphy

UC Berkeley ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
196
Abstract Views
801
Rank
291,327
PlumX Metrics