Statistical Bias in Racial and Ethnic Disparity Estimates Using BIFSG
34 Pages Posted: 19 Mar 2024 Last revised: 29 Oct 2024
Date Written: October 29, 2024
Abstract
Bayesian Improved First Name and Surname Geocoding (BIFSG) is a widely used method for inferring race and ethnicity in data when this information is not available. It is well known that the assumptions underlying BIFSG can fail, but the effects of these failures on estimation of disparities by race and ethnicity are not well understood. In this paper we combine US administrative tax data with data containing race and ethnicity to assess statistical bias in estimates of differences in tax outcomes between racial/ethnic groups. Based on our sample population, we find that BIFSG suffers from majoritarian bias, overstating the probabilities that non-White individuals are White. When using these probabilities as weights to estimate disparities in US federal income tax benefits between groups, BIFSG estimates typically understate differences in various outcomes between White and non-White taxpayer.
Keywords: Tax, BIFSG, BISG, race, ethnicity,
JEL Classification: H2, C11, C81, H20, C18, J15
Suggested Citation: Suggested Citation