From Lexicons to Large Language Models: A Holistic Evaluation of Psychometric Text Analysis in Social Science Research
49 Pages Posted: 24 Apr 2024 Last revised: 9 Apr 2025
Date Written: March 28, 2024
Abstract
Extracting psychological constructs from text is critical for social science researchers studying attitudes, perceptions, and behaviors across online platforms and other forms of written and spoken communication. In this study, we perform a holistic evaluation of four major paradigms for extracting psychological constructs from text, addressing a breadth of relevant dimensions of performance, as well as in-depth analysis informed by dual processing theory. We demonstrate that Large Language Models (LLMs) achieve comparable or superior performance to traditional methods, exhibiting high predictive accuracy, consistency across diverse text samples, and fairness, while reducing or eliminating need for domain or NLP expertise and costly manual annotations. From the perspective of dual processing theory, we further investigate how human annotations are influenced by the alignment between individuals’ cognitive and affective abilities and the psychological constructs being extracted. Most paradigms for measuring psychological constructs are supervised methods relying heavily on large quantities of human-labeled data, but dependence on human annotators may introduce noise in these underlying datasets.
Keywords: natural language processing (NLP), data annotation, data labeling, transformers, large language models (LLMs), ChatGPT, psychometrics, generative AI
Suggested Citation: Suggested Citation
From Lexicons to Large Language Models: A Holistic Evaluation of Psychometric Text Analysis in Social Science Research
(March 28, 2024). Available at SSRN: https://ssrn.com/abstract=4776480 or http://dx.doi.org/10.2139/ssrn.4776480