Regressions, Short and Long
Posted: 17 Jul 2002
We study the problem of identification of the long regression E(y|x,z) when the short conditional distributions P(y|x) and P(z|x) are known but the long conditional distribution P(y|x,z) is not known. This problem often arises when a researcher utilizes data from two separate data sets. (A leading example is the ecological inference problem of political science, where voting behavior across electoral districts is observed from administrative records, the demographic composition of voters within a district is observed from census data, and the researcher wants to infer voting behavior conditional on district and demographic attributes.) We isolate an identification region containing feasible values of the long regression, and show that this region forms a sharp bound on the long regression. The identification region can be calculated precisely when y has finite support. When y has infinite support we characterize two sets, one that contains the identification region, and one that is contained by it. Following this completely nonparametric analysis, we examine the identifying power yielded by exclusion restrictions across distinct covariate values. Such restrictions cause the identification region to shrink, in many cases to a single point. To illustrate the theory, we pose and address this hypothetical question: What would be the outcome if the 1996 U.S. presidential election were re-enacted in a population of different demographic composition, ceteris paribus?
Suggested Citation: Suggested Citation