Splitting a Predictor at the Upper Quarter or Third and the Lower Quarter or Third

14 Pages Posted: 11 Sep 2007

See all articles by Andrew Gelman

Andrew Gelman

Columbia University - Department of Statistics and Department of Political Science

David Park

George Washington University

Date Written: July 2007

Abstract

A linear regression of y on x can be approximated by a simple difference: the average values of y corresponding to the highest quarter or third of x, minus the average values of y corresponding to the lowest quarter or third of x. A simple theoretical analysis shows this comparison performs reasonably well, with 80%-90% efficiency compared to the linear regression if the predictor is uniformly or normally distributed. Discretizing x into three categories claws back about half the efficiency lost by the commonly-used strategy of dichotomizing the predictor. We illustrate with the example that motivated this research: an analysis of income and voting which we had originally performed for a scholarly journal but then wanted to communicate to a general audience.

Keywords: discretization,linear regression,statistical communication,trichotomizing

Suggested Citation

Gelman, Andrew and Park, David, Splitting a Predictor at the Upper Quarter or Third and the Lower Quarter or Third (July 2007). Available at SSRN: https://ssrn.com/abstract=1010473 or http://dx.doi.org/10.2139/ssrn.1010473

Andrew Gelman (Contact Author)

Columbia University - Department of Statistics and Department of Political Science ( email )

New York, NY 10027
United States
212-854-4883 (Phone)
212-663-2454 (Fax)

David Park

George Washington University ( email )

2121 I Street NW
Washington, DC 20052
United States