Patent Similarity Data and Innovation Metrics
Journal of Empirical Legal Studies, 2020
26 Pages Posted: 28 Aug 2020
Date Written: July 21, 2020
We introduce and describe the Patent Similarity Data set, comprising vector space model-based similarity scores for United States utility patents. The data set provides approximately 640 million pre-calculated similarity scores, as well as the code and computed vectors required to calculate further pairwise similarities. In addition to the raw data, we introduce measures that leverage patent similarity to provide insight into innovation and intellectual property law issues of interest to both scholars and policymakers. Code is provided in accompanying scripts to assist researchers in obtaining the data set, joining it with other available patent data, and using it in their research.
Keywords: Patent, Doc2Vec, Patent Similarity, Patent Distance
JEL Classification: K30, Y10, O34
Suggested Citation: Suggested Citation