Patent Text and Long-Run Innovation Dynamics: The Critical Role of Model Selection

68 Pages Posted: 17 Sep 2024 Last revised: 27 Nov 2024

See all articles by Ina Ganguli

Ina Ganguli

University of Massachusetts at Amherst - College of Social and Behavioral Sciences - Department of Economics; Harvard University - Harvard Kennedy School (HKS), Center for International Development

Jeffrey Lin

Federal Reserve Banks - Federal Reserve Bank of Philadelphia

Vitaly Meursault

Federal Reserve Banks - Federal Reserve Bank of Philadelphia

Nicholas Reynolds

University of Essex

Date Written: September 2024

Abstract

As distorted maps may mislead, Natural Language Processing (NLP) models may misrepresent. How do we know which NLP model to trust? We provide comprehensive guidance for selecting and applying NLP representations of patent text. We develop novel validation tasks to evaluate several leading NLP models. These tasks assess how well candidate models align with both expert and non-expert judgments of patent similarity. State-of-the-art language models significantly outperform traditional approaches such as TF-IDF. Using our validated representations, we measure a secular decline in contemporaneous patent similarity: inventors are “spreading out” over an expanding knowledge frontier. This finding is corroborated by declining rates of multiple invention from newly-digitized historical patent interference records. In contrast, selecting another single representation without validating alternatives yields an ambiguous or even opposing trend. Thus, our framework addresses a fundamental challenge of selecting among different black-box NLP models that produce varying economic measurements. To facilitate future research, we plan to provide our validation task data and embeddings for all US patents from 1836–2023.

Institutional subscribers to the NBER working paper series, and residents of developing countries may download this paper without additional charge at www.nber.org.

Suggested Citation

Ganguli, Ina and Lin, Jeffrey and Meursault, Vitaly and Reynolds, Nicholas, Patent Text and Long-Run Innovation Dynamics: The Critical Role of Model Selection (September 2024). NBER Working Paper No. w32934, Available at SSRN: https://ssrn.com/abstract=4957388

Ina Ganguli (Contact Author)

University of Massachusetts at Amherst - College of Social and Behavioral Sciences - Department of Economics ( email )

Amherst, MA 01003
United States

Harvard University - Harvard Kennedy School (HKS), Center for International Development ( email )

79 John F. Kennedy Street
Cambridge, MA 02138
United States
617-496-9066 (Phone)

Jeffrey Lin

Federal Reserve Banks - Federal Reserve Bank of Philadelphia ( email )

Ten Independence Mall
Philadelphia, PA 19106-1574
United States

Vitaly Meursault

Federal Reserve Banks - Federal Reserve Bank of Philadelphia ( email )

Ten Independence Mall
Philadelphia, PA 19106-1574
United States

Nicholas Reynolds

University of Essex ( email )

Wivenhoe Park
Colchester CO4 3SQ
United Kingdom

HOME PAGE: http://nicholas-reynolds.com

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
9
Abstract Views
73
PlumX Metrics