Testing the Presence of Outliers to Assess Misspecification in Regression Models
52 Pages Posted: 8 Aug 2018
Date Written: July 20, 2018
The presence of outlying observations in a regression model can be indicative of model misspecification, consequently, it is important to check for possible outlier contamination. However, algorithms used to detect outliers have a positive probability of finding outliers even when, in fact, the data generation process has no outliers. Deriving distributional results on the expected retention rate of falsely discovered outliers, we propose two set of tests for the overall presence of outliers and thus model misspecification: first, tests on whether the observed proportion and number of detected outliers deviate from their expected values. Second, ‘scaling’ tests on whether the number of detected outliers decreases proportionally with the level of significance used to detect outliers. We derive the asymptotic distribution of the tests for the presence of outliers based on iterated 1-step Huber-skip M-estimators. The first set of tests has power against the number of outliers present, while the second set of tests has power against both outlier magnitude and number. In applications of the tests we consider a cross-sectional macroeconomic model of economic growth, and re-visit a set of previous studies using indicator saturation. The tests are valid for stationary as well as (stochastically) trending regressors and can readily be implemented using Autometrics in PcGive or the R-package gets.
Keywords: misspecification, outlier detection, robust estimation, iterated 1-step Huber-skip M-estimator, indicator saturation
JEL Classification: C12, C52
Suggested Citation: Suggested Citation