Measuring Document Similarity with Weighted Averages of Word Embeddings

25 Pages Posted: 20 Apr 2022 Last revised: 31 Jan 2023

Date Written: September 2022

Abstract

We detail a methodology for estimating the textual similarity between two documents while accounting for the possibility that two different words can have a similar meaning. We illustrate the method's usefulness in facilitating comparisons between documents with very different formats and vocabularies by textually linking occupation task and industry output descriptions with related technologies as described in patent texts; we also examine economic applications of the resultant document similarity measures. In a final application we demonstrate that the method also works well relative to alternatives for comparing documents within the same domain by showing that pairwise textual similarity between occupations' task descriptions strongly predicts the probability that a given worker will transition from one occupation to another. Finally, we offer some suggestions on other potential uses and guidance in implementing the method.

Keywords: textual analysis, measuring document similarity

Suggested Citation

Seegmiller, Bryan and Papanikolaou, Dimitris and Schmidt, Lawrence, Measuring Document Similarity with Weighted Averages of Word Embeddings (September 2022). Available at SSRN: https://ssrn.com/abstract=4088443 or http://dx.doi.org/10.2139/ssrn.4088443

Bryan Seegmiller (Contact Author)

Northwestern University - Kellogg School of Management ( email )

2001 Sheridan Road
Evanston, IL 60208
United States

Dimitris Papanikolaou

Northwestern University - Kellogg School of Management - Department of Finance ( email )

Evanston, IL 60208
United States

National Bureau of Economic Research (NBER) ( email )

1050 Massachusetts Avenue
Cambridge, MA 02138
United States

Lawrence Schmidt

MIT Sloan School of Management ( email )

77 Massachusetts Avenue
Cambridge, MA 02139-4307
United States

HOME PAGE: http://https://sites.google.com/site/lawrencedwschmidt/home

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
225
Abstract Views
776
Rank
262,083
PlumX Metrics