Linguistic Metrics for Patent Disclosure: Evidence from University Versus Corporate Patents
57 Pages Posted: 1 Oct 2020
Date Written: 2020
Encouraging inventors to disclose new inventions is an important economic justification for the patent system, yet the technical information contained in patent applications is often inadequate and unclear. This paper proposes a novel approach to measure disclosure in patent applications using algorithms from computation allinguistics. Borrowing methods from the literature on second language acquisition, we analyze core linguistic features of 40,949 U.S. applications in three patent categories related to nanotechnology, batteries, and electricity from 2000 to 2019. Relying on the expectation that universities have more incentives to disclose their inventions than corporations for either incentive reasons or for different source documents that patent attorneys can draw on, we confirm the relevance and usefulness of the linguistic measures by showing that university patents are more readable. Combining the multiple measures using principal component analysis, we find that the gap in disclosure is 0.4 SD, with a wider gap between top applicants. Our results do not change after accounting for the heterogeneity of inventions by controlling for cited-patent fixed effects. We also explore whether one pathway by which corporate patents become less readable is use of multiple examples to mask the “best mode” of inventions. By confirming that computational linguistic measures are useful indicators of readability of patents, we suggest that the disclosure function of patents can be explored empirically in a way that has not previously been feasible.
Keywords: patent disclosure, computational linguistic analysis, readability
JEL Classification: K110, O310, O340
Suggested Citation: Suggested Citation