Combining Panel Data Sets with Attrition and Refreshment Samples

41 Pages Posted: 25 Jul 2000 Last revised: 18 Feb 2023

See all articles by Keisuke Hirano

Keisuke Hirano

Pennsylvania State University, College of the Liberal Arts - Department of Economic

Guido W. Imbens

Stanford Graduate School of Business

Geert Ridder

University of Southern California

Donald B. Rubin

Harvard University - Department of Statistics

Date Written: April 1998

Abstract

In many fields researchers wish to consider statistical models that allow for more complex relationships than can be inferred using only cross-sectional data. Panel or longitudinal data where the same units are observed repeatedly at different points in time can often provide the richer data needed for such models. Although such data allows researchers to identify more complex models than cross-sectional data, missing data problems can be more severe in panels. In particular, even units who respond in initial waves of the panel may drop out in subsequent waves, so that the subsample with complete data for all waves of the panel can be less representative of the population than the original sample. Sometimes, in the hope of mitigating the effects of attrition without losing the advantages of panel data over cross-sections, panel data sets are augmented by replacing units who have dropped out with new units randomly sampled from the original population. Following Ridder (1992), who used these replacement units to test some models for attrition, we call such additional samples refreshment samples. We explore the benefits of these samples for estimating models of attrition. We describe the manner in which the presence of refreshment samples allows the researcher to test various models for attrition in panel data, including models based on the assumption that missing data are missing at random (MAR, Rubin, 1976; Little and Rubin, 1987). The main result in the paper makes precise the extent to which refreshment samples are informative about the attrition process; a class of non-ignorable missing data models can be identified without making strong distributional or functional form assumptions if refreshment samples are available.

Suggested Citation

Hirano, Keisuke and Imbens, Guido W. and Ridder, Geert and Rubin, Donald B., Combining Panel Data Sets with Attrition and Refreshment Samples (April 1998). NBER Working Paper No. t0230, Available at SSRN: https://ssrn.com/abstract=226640

Keisuke Hirano (Contact Author)

Pennsylvania State University, College of the Liberal Arts - Department of Economic ( email )

524 Kern Graduate Building
University Park, PA 16802-3306
United States

Guido W. Imbens

Stanford Graduate School of Business ( email )

655 Knight Way
Stanford, CA 94305-5015
United States

Geert Ridder

University of Southern California ( email )

Kaprielian Hall
Los Angeles, CA 90089
United States
213-740-2110 (Phone)
213-740-8543 (Fax)

Donald B. Rubin

Harvard University - Department of Statistics ( email )

Science Center 7th floor
One Oxford Street
Cambridge, MA 02138-2901
United States