Improving Low-Probability Judgments

72 Pages Posted: 10 Jan 2025

See all articles by Pavel D. Atanasov

Pavel D. Atanasov

IE University; Pytho LLC

Coralie Consigny

Independent

Ezra Karger

Federal Reserve Bank of Chicago

Philipp Schoenegger

London School of Economics & Political Science (LSE)

David V. Budescu

Fordham University - Fordham College at Rose Hill

Philip Tetlock

University of Pennsylvania

Date Written: November 17, 2024

Abstract

High-stakes debates often pivot on clashing estimates of outcomes that one side sees as so improbable as not to deserve policy prioritization. These debates are especially intractable when they focus on rare events ranging from disasters (e.g., existential risks from Artificial Intelligence, nuclear war, or bioengineered pandemics) to surprising successes (e.g., once inconceivable scientific discoveries). The research literature offers grounds for suspecting that the micro-probability judgments flowing into such debates are both unreliable and biased. This article covers experimental manipulations that achieve improvements in accuracy for low-probability judgments by shifting from the standard linear elicitation scale and Brier scoring rule to nonlinear (logarithmic) elicitation scales and logarithmic scoring rules. These methodological changes produced accuracy improvements of approximately d = 0.2 to 0.5 for individual accuracy scores. Improvements in aggregate accuracy varied more widely by aggregation function (mean vs. median) and accuracy scoring rule, between parity (d = 0) and a large advantage for non-linear over linear scales (d = 0.68). Judgments obtained via the linear scale and text box elicitations systematically overestimated the true values. New scales allowed forecasters to provide precise judgments at the low end of the probability scale and logarithmic scoring rules penalize large errors harshly, incentivising judges to avoid 0%and provide precise non-zero probabilities. An indirect elicitation protocol we developed, successive menus, yielded mixed results, such as improving aggregate accuracy and individual calibration at the cost of increasing outlier judgments and reducing retention. Base rate anchors provided context but no measurable accuracy benefits. These results point to next steps for improving probability judgments of rare events. The most promising next steps include a) using subject-specific Base-Rate Anchors, b) developing training programs specific to low-probability events, c) developing more robust and usable indirect elicitation protocols, and d) assessing all of these methods in longitudinal forecasting tournament featuring many forecasting questions focused on rare events.

Suggested Citation

Atanasov, Pavel D. and Consigny, Coralie and Karger, Ezra and Schoenegger, Philipp and Budescu, David V. and Tetlock, Philip, Improving Low-Probability Judgments (November 17, 2024). Available at SSRN: https://ssrn.com/abstract=5025990 or http://dx.doi.org/10.2139/ssrn.5025990

Pavel D. Atanasov (Contact Author)

IE University ( email )

Castellón de la Plana 8
Madrid, 28006
Spain

Pytho LLC ( email )

Madrid
Spain
641179247 (Phone)

HOME PAGE: http://pavelatanasov.net

Coralie Consigny

Independent ( email )

Ezra Karger

Federal Reserve Bank of Chicago

230 South LaSalle Street
Chicago, IL 60604
United States

Philipp Schoenegger

London School of Economics & Political Science (LSE) ( email )

Houghton Street
London, WC2A 2AE
United Kingdom

David V. Budescu

Fordham University - Fordham College at Rose Hill ( email )

United States

Philip Tetlock

University of Pennsylvania ( email )

Philadelphia, PA 19104
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
112
Abstract Views
392
Rank
503,871
PlumX Metrics