LLM-Measure: Generating Valid, Consistent, and Reproducible Text-Based Measures for Social Science Research

41 Pages Posted: 28 Oct 2024

See all articles by Yi Yang

Yi Yang

HKUST Business School

Hanyu Duan

HKUST Business School

Jiaxin Liu

HKUST Business School

Kar Yan Tam

Hong Kong University of Science and Technology

Date Written: September 12, 2024

Abstract

The increasing use of text as data in social science research necessitates the development of valid, consistent, reproducible, and efficient methods for generating text-based concept measures. This paper presents a novel method that leverages the internal hidden states of large language models (LLMs) to generate these concept measures. Specifically, the proposed method learns a concept vector that captures how the LLM internally represents the target concept, then estimates the concept value for text data by projecting the text's LLM hidden states onto the concept vector. Three replication studies demonstrate the method's effectiveness in producing highly valid, consistent, and reproducible text-based measures across various social science research contexts, highlighting its potential as a valuable tool for the research community.

Suggested Citation

Yang, Yi and Duan, Hanyu and Liu, Jiaxin and Tam, Kar Yan, LLM-Measure: Generating Valid, Consistent, and Reproducible Text-Based Measures for Social Science Research (September 12, 2024). HKUST Business School Research Paper No. 2024-186, Available at SSRN: https://ssrn.com/abstract=4961494 or http://dx.doi.org/10.2139/ssrn.4961494

Yi Yang (Contact Author)

HKUST Business School ( email )

Clearwater Bay
Kowloon, 999999
Hong Kong

Hanyu Duan

HKUST Business School ( email )

Clear Water Bay
Kowloon
Hong Kong

Jiaxin Liu

HKUST Business School ( email )

Clear Water Bay
Kowloon
Hong Kong

Kar Yan Tam

Hong Kong University of Science and Technology ( email )

Clear Water Bay, Kowloon
Hong Kong

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
94
Abstract Views
456
Rank
600,339
PlumX Metrics