When Conservatives See Red but Liberals Feel Blue: Why Labeler-Characteristic Bias Matters for Data Annotation
60 Pages Posted: 13 Sep 2023
Date Written: August 14, 2023
Abstract
Human annotation of data, including text and image materials, is a bedrock of political science research. Yet we often overlook how the identities of our annotators may systematically affect their labels. We call the sensitivity of labels to annotator identity "labeler-characteristic bias" (LCB). We demonstrate the persistence and risks of LCB for downstream analyses in two examples, first with image data from the United States and second with text data from the Netherlands. In both examples we observe significant differences in annotations based on annotator gender and political identity. After laying out a general typology of annotator biases and their relationship to inter-rater reliability, we provide suggestions and solutions for how to handle LCB. The first step to addressing LCB is to recruit a diverse labeler corps and test for LCB. Where LCB is found, solutions are modeling subgroup effects or generating composite labels based on target population demographics.
Keywords: human annotation, labeler characteristic bias, inter-rater reliability, text as data, images as data
Suggested Citation: Suggested Citation