Outliers and Robust Inference in Archival Accounting Research
82 Pages Posted: 14 Jul 2021 Last revised: 3 May 2023
Date Written: May 1, 2023
Abstract
Archival variable distributions are frequently skewed or heavy-tailed as a result of the scaling of variables and the use of heterogeneous samples of firms. We illustrate that even in large samples, such non-normality in commonly used variables makes linear regression estimates based on OLS imprecise (i.e., inefficient) and results in low statistical power. We next show that "robust regression" estimators can limit this efficiency loss, but that their performance varies substantially across estimator type and the choice of "normal efficiency." We also find that robust regression estimators often materially reduce sample size, similar to variable-by-variable truncation, and that the non-random downweighting of observations identified as outliers can materially change coefficient estimates and their interpretation. Lastly, we show that a common approach to cluster standard errors with robust estimators leads to inflated t-statistics and potentially incorrect inferences. We provide guidance on how researchers can improve their understanding of the effects of non-normal variable distributions and how to improve the inferences drawn from robust regression estimators.
Keywords: Skewness, non-normality, OLS, efficiency, outliers, robust regression, MM-estimation, scaling, leverage
JEL Classification: C13, C15, C18, M41
Suggested Citation: Suggested Citation