Machine Learning and Natural Language Processing on the Patent Corpus: Data, Tools, and New Measures

19 Pages Posted: 2 Aug 2018

See all articles by Benjamin Balsmeier

Benjamin Balsmeier

KU Leuven - Department of Managerial Economics, Strategy, and Innovation

Mohamad Assaf

Universite du Luxembourg - Centre for Research in Economics and Management

Tyler Chesebro

Massachusetts Institute of Technology (MIT) - Electrical Engineering and Computer Science

Gabe Fierro

Massachusetts Institute of Technology (MIT) - Electrical Engineering and Computer Science

Kevin Johnson

University of California, Berkeley - Department of Electrical Engineering & Computer Sciences (EECS)

Scott Johnson

University of California, Berkeley - Department of Electrical Engineering & Computer Sciences (EECS)

Guan‐Cheng Li

University of California, Berkeley - Coleman Fung Institute for Engineering Leadership

Sonja Lück

University of Paderborn

Doug O'Reagan

University of California, Berkeley - Coleman Fung Institute for Engineering Leadership

Bill Yeh

Massachusetts Institute of Technology (MIT) - Electrical Engineering and Computer Science

Guangzheng Zang

Massachusetts Institute of Technology (MIT) - Electrical Engineering and Computer Science

Lee Fleming

Harvard University - Technology & Operations Management Unit

Date Written: Fall 2018

Abstract

Drawing upon recent advances in machine learning and natural language processing, we introduce new tools that automatically ingest, parse, disambiguate, and build an updated database using U.S. patent data. The tools identify unique inventor, assignee, and location entities mentioned on each granted U.S. patent from 1976 to 2016. We describe data flow, algorithms, user interfaces, descriptive statistics, and a novelty measure based on the first appearance of a word in the patent corpus. We illustrate an automated coinventor network mapping tool and visualize trends in patenting over the last 40 years.

Keywords: database, disambiguation, machine learning, natural language processing, patent, social networks

JEL Classification: C80, C81, C88, O33, O34

Suggested Citation

Balsmeier, Benjamin and Assaf, Mohamad and Chesebro, Tyler and Fierro, Gabe and Johnson, Kevin and Johnson, Scott and Li, Guan‐Cheng and Lück, Sonja and O'Reagan, Doug and Yeh, Bill and Zang, Guangzheng and Fleming, Lee, Machine Learning and Natural Language Processing on the Patent Corpus: Data, Tools, and New Measures (Fall 2018). Journal of Economics & Management Strategy, Vol. 27, Issue 3, pp. 535-553, 2018. Available at SSRN: https://ssrn.com/abstract=3224864 or http://dx.doi.org/10.1111/jems.12259

Benjamin Balsmeier (Contact Author)

KU Leuven - Department of Managerial Economics, Strategy, and Innovation ( email )

Naamsestraat 69 bus 3500
Leuven, 3000
Belgium

Mohamad Assaf

Universite du Luxembourg - Centre for Research in Economics and Management

Luxembourg

Tyler Chesebro

Massachusetts Institute of Technology (MIT) - Electrical Engineering and Computer Science

77 Massachusetts Avenue
Cambridge, MA 02139-4307
United States

Gabe Fierro

Massachusetts Institute of Technology (MIT) - Electrical Engineering and Computer Science

77 Massachusetts Avenue
Cambridge, MA 02139-4307
United States

Kevin Johnson

University of California, Berkeley - Department of Electrical Engineering & Computer Sciences (EECS)

Berkeley, CA 94720-1712
United States

Scott Johnson

University of California, Berkeley - Department of Electrical Engineering & Computer Sciences (EECS)

Berkeley, CA 94720-1712
United States

Guan‐Cheng Li

University of California, Berkeley - Coleman Fung Institute for Engineering Leadership

130 Blum Hall #5580
Berkeley, CA 94720-5580
United States

Sonja Lück

University of Paderborn

Warburger Str. 100
Paderborn, D-33098
Germany

Doug O'Reagan

University of California, Berkeley - Coleman Fung Institute for Engineering Leadership

130 Blum Hall #5580
Berkeley, CA 94720-5580
United States

Bill Yeh

Massachusetts Institute of Technology (MIT) - Electrical Engineering and Computer Science

77 Massachusetts Avenue
Cambridge, MA 02139-4307
United States

Guangzheng Zang

Massachusetts Institute of Technology (MIT) - Electrical Engineering and Computer Science

77 Massachusetts Avenue
Cambridge, MA 02139-4307
United States

Lee Fleming

Harvard University - Technology & Operations Management Unit ( email )

Boston, MA 02163
United States
617 495 6613 (Phone)
617 496 5265 (Fax)

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
1
Abstract Views
372
PlumX Metrics