Longitudinal Classification and Predictive Modeling for Historical CPS Data Using Random Forests

2022 Systems and Information Engineering Design Symposium (SEIDS)

6 Pages Posted: 18 May 2022

See all articles by Cecile K. Johnson

Cecile K. Johnson

University of Virginia - School of Data Science

Hannah Schmuckler

University of Virginia - School of Data Science

Date Written: April 29, 2022

Abstract

The US Census Bureau uses its decennial census codes for industry and occupation in the monthly Current Population Survey. The Census Bureau has regularly revised these three- and four-digit codes to more accurately reflect the reality of work in the United States. These changes make it difficult to study industries and occupations over time. While limited crosswalks exist, there is currently no way to translate an individual’s coded occupation or industry to every other scheme for long-term comparison by social scientists. This project aims to impute the most likely code for an individual’s occupation and industry into each year’s coding scheme by using random forest models to translate industry and occupation across decades. To our knowledge, this is the first tool that can map industry and occupation at scale with a high degree of accuracy into any year’s scheme.

Keywords: Random Forest, Current Population Survey, Industry, Occupation, Longitudinal Classification, Data Science

Suggested Citation

Johnson, Cecile K. and Schmuckler, Hannah, Longitudinal Classification and Predictive Modeling for Historical CPS Data Using Random Forests (April 29, 2022). 2022 Systems and Information Engineering Design Symposium (SEIDS), Available at SSRN: https://ssrn.com/abstract=4094304 or http://dx.doi.org/10.2139/ssrn.4094304

Cecile K. Johnson

University of Virginia - School of Data Science ( email )

Hannah Schmuckler (Contact Author)

University of Virginia - School of Data Science ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
41
Abstract Views
206
PlumX Metrics