44 Pages Posted: 1 Mar 2019
Date Written: February 2019
Binscatter is very popular in applied microeconomics. It provides a flexible, yet parsimonious way of visualizing and summarizing “big data” in regression settings, and it is often used for informal testing of substantive hypotheses such as linearity or monotonicity of the regression function. This paper presents a foundational, thorough analysis of binscatter: We give an array of theoretical and practical results that aid both in understanding current practices (that is, their validity or lack thereof) and in offering theory-based guidance for future applications. Our main results include principled number of bins selection, confidence intervals and bands, hypothesis tests for parametric and shape restrictions of the regression function, and several other new methods, applicable to canonical binscatter as well as higher-order polynomial, covariate-adjusted, and smoothness-restricted extensions thereof. In particular, we highlight important methodological problems related to covariate adjustment methods used in current practice. We also discuss extensions to clustered data. Our results are illustrated with simulated and real data throughout. Companion general-purpose software packages for Stata and R are provided. Finally, from a technical perspective, new theoretical results for partitioning-based series estimation are obtained that may be of independent interest.
Keywords: binned scatter plot, regressogram, piecewise polynomials, splines, partitioning estimators, nonparametric regression, robust bias correction, uniform inference, binning selection
JEL Classification: C14, C18, C21
Suggested Citation: Suggested Citation