Inference with Arbitrary Clustering
34 Pages Posted: 9 Sep 2019
Analyses of spatial or network data are now very common. Yet statistical inference is challenging since unobserved heterogeneity can be correlated across neighboring observational units. We develop an estimator for the variance-covariance matrix (VCV) of OLS and 2SLS that allows for arbitrary dependence of the errors across observations in space or network structure, and across time periods. As a proof of concept, we conduct Monte Carlo simulations in a geospatial setting based on US Metropolitan areas; tests based on our estimator of the VCV asymptotically correctly reject the null hypothesis where conventional inference methods, e.g. those without clusters, or with clusters based on administrative units, reject the null hypothesis too often. We also provide simulations in a network setting based on the IDEAS structure of co-authorship and real life data on scientific performance; the Monte Carlo results again show that our estimator yields inference at the right significance level already in moderately sized samples, and it dominates other commonly used approaches to inference in networks. We provide guidance to the applied researcher with respect to (i) including or not potentially correlated regressors and (ii) choice of cluster bandwidth. Finally we provide a companion statistical package (acreg) enabling users to adjust OLS and 2SLS coefficient's standard errors, accounting for arbitrary dependence.
Keywords: clustering, arbitrary, geospatial data, network data
JEL Classification: C13, C23, C26
Suggested Citation: Suggested Citation