Collapsing Corporate Confusion Leveraging Network Structures for Effective Entity Resolution in Relational Corporate Data

12 Pages Posted: 16 Oct 2017

See all articles by Tim Marple

Tim Marple

University of California, Berkeley

Bruce A. Desmarais

Pennsylvania State University

Kevin Young

University of Massachusetts Amherst

Date Written: October 12, 2017

Abstract

In this paper, we introduce a novel battery of classifiers to resolve inconsistencies among entity names within large corporate datasets. Using data on the corporate sector, we describe our relational approach to entity resolution, and the problems in existing approaches it serves to address. We leverage the relational structure of BoardEx employment data to test the efficacy of these classifiers using a ground-truth sample of coded name inconsistencies. We show that these classifiers accurately resolve such inconsistencies, and further show the effect of this resolution on network topology. We conclude with implications for existing findings and steps for future work.

Keywords: entity resolution, network methods, corporate data, BoardEx

Suggested Citation

Marple, Tim and Desmarais, Bruce A. and Young, Kevin, Collapsing Corporate Confusion Leveraging Network Structures for Effective Entity Resolution in Relational Corporate Data (October 12, 2017). Available at SSRN: https://ssrn.com/abstract=3053632 or http://dx.doi.org/10.2139/ssrn.3053632

Tim Marple (Contact Author)

University of California, Berkeley ( email )

310 Barrows Hall
Berkeley, CA 94720
United States

Bruce A. Desmarais

Pennsylvania State University ( email )

University Park, State College, PA 16801
United States

HOME PAGE: http://sites.psu.edu/desmaraisgroup

Kevin Young

University of Massachusetts Amherst ( email )

Department of Political Science
Amherst, MA 01003
United States

Register to save articles to
your library

Register

Paper statistics

Downloads
29
Abstract Views
209
PlumX Metrics