Investigating Cohort Similarity as an Ex Ante Alternative to Patent Forward Citations

33 Pages Posted: 28 May 2020 Last revised: 29 Oct 2020

See all articles by Jonathan H. Ashtor

Jonathan H. Ashtor

Paul, Weiss, Rifkind, Wharton & Garrison LLP; Benjamin N. Cardozo School of Law

Multiple version iconThere are 2 versions of this paper

Date Written: December 2019


Forward citations are arguably the most widely used empirical metric for patents, including as indicators of patent information content, cumulative innovation, value, and knowledge flows. However, forward citations have major shortcomings. Citations require long time horizons to accrue, and therefore they cannot be observed until several years after a patent issues. Citation data are often noisy, discontinuous, and highly skewed, complicating empirical analysis. Moreover, recent studies have questioned the reliability of citation data. As such, the most widely used empirical metric of patents is also the most suspect. This study constructs a measure of patents that correlates with forward citations, but is observable ex ante, immediately upon patent issuance or even earlier upon publication of a patent application. In addition, this measure is continuous and evenly distributed, such that it is suitable for large‐scale patent analytics applications. Finally, unlike citations, the measure is portable across patent systems, facilitating cross‐border comparisons of portfolios and datasets. Specifically, I construct a measure of the similarity of a patent to its technological‐temporal cohort, based on linguistic analysis of claim text. I employ advanced computational linguistic techniques to analyze the claims of all U.S. patents issued in the period 1976–2017, over 6 million patents in total, and I calculate the average degree of conceptual similarity of each patented invention to all others in the same technology field and time period cohort. I then extend the methodology to all issued EP patents, over 1.6 million in total. I validate the resulting measures against multiple established patent metrics for U.S. and EP patents. I test the robustness of this measure as a forecast for future patent citations in empirical research and big‐data applications. I find that cohort similarity correlates significantly with forward citations received by both U.S. and EP patents. Cohort similarity also substitutes for citations in leading prior studies of R&D output and innovation. Finally, I demonstrate that, unlike citations, cohort similarity is comparable across the U.S. and EP patent systems. Accordingly, cohort similarity may be useful for empirical patent research, comparative studies of patent policy, and analytics of large‐scale patent portfolios.

Suggested Citation

Ashtor, Jonathan H., Investigating Cohort Similarity as an Ex Ante Alternative to Patent Forward Citations (December 2019). Journal of Empirical Legal Studies, Vol. 16, Issue 4, pp. 848-880, 2019, Available at SSRN: or

Jonathan H. Ashtor (Contact Author)

Paul, Weiss, Rifkind, Wharton & Garrison LLP ( email )

New York, NY 10019
United States
212-373-3823 (Phone)

Benjamin N. Cardozo School of Law ( email )

United States

Here is the Coronavirus
related research on SSRN

Paper statistics

Abstract Views
PlumX Metrics