Testing the Presence of Outliers in Regression Models
72 Pages Posted: 8 Aug 2018 Last revised: 15 Feb 2020
Date Written: July 20, 2018
Algorithms used to detect outliers in regression models have a positive probability of finding outliers even when the data generation process has no outliers. We propose two sets of tests for the overall presence of outliers based on the false-discovery rate of outliers. First, `simple' tests on whether the proportion (or number) of detected outliers deviates from its expected value. Second, `scaling' tests on whether the proportion (or number) of detected outliers decreases proportionally with the level of the cut-off used to detect outliers. The proposed tests can be uniformly applied to regressions regardless of whether the regressors are stationary, deterministically trending, unit root, or explosive processes. We show the versatility of the tests in a classic cross-sectional model of economic growth as well as a panel difference-in-differences model of CO2 emissions in response to the introduction of North America's first major carbon tax. Our tests show the presence of significant outliers in emissions in the un-taxed control group which results in an over-estimation of the emissions reductions in response to the carbon tax.
Keywords: misspecification, outlier detection, robust estimation, iterated 1-step Huber-skip M-estimator, indicator saturation
JEL Classification: C12, C52
Suggested Citation: Suggested Citation