Semantically-Based Patent Thicket Identification
71 Pages Posted: 6 Mar 2020 Last revised: 20 Mar 2020
Date Written: March 1, 2020
Patent thickets have been identified as a major stumbling block in the development of new technologies, creating the need to accurately identify thicket membership. Various citations-based methodologies (Graevenitz et al, 2011; Clarkson, 2005) have been proposed, which have relied on broad survey results (Cohen et al, 2000) for validation. Expert evaluation is an alternative direct method of judging thicket membership at the individual patent level. While this method potentially is robust to drafting and jurisdictional differences in patent design, it is also costly to use on a large scale. We employ a natural language processing technique, which does not carry these large costs, to proxy expert views closely. Furthermore, we investigate the relation between our semantic measure and citation based measures, finding them quite distinct. We then combine a variety of thicket indicators into a statistical model to assess the probability that a newly added patent belongs to a thicket. We also study the role each measure plays, as part of creating a prospective screening model that could improve efficiency of the patent system, in response to Lemley (2001).
Keywords: Patent Thicket, Intellectual Property, Semantic Distance, Latent Semantic Analysis, Natural Language Processing, Complexity
JEL Classification: L13, L20, O34
Suggested Citation: Suggested Citation