Link Prediction Based on Graph Topology: The Predictive Value of Generalized Clustering Coefficient

31 Pages Posted: 4 Jul 2010

See all articles by Zan Huang

Zan Huang

Pennsylvania State University

Date Written: March 2, 2010

Abstract

Predicting linkages among data objects is a fundamental task for applications in various domains. In many contexts link prediction is entirely based on the linkage information itself. Link-structure based link prediction is closely related to a parallel and almost separate stream of research on topological modeling of large-scale graphs. The well-studied topological measures that summarize the global structure of a graph, such as clustering coefficient, average path length, and degree distribution, have direct implications for link prediction. This paper is an initial effort to explore the connection between link prediction and graph topology and focuses on the predictive value of the clustering coefficient measure. The standard clustering coefficient measure is generalized to capture higher-order clustering tendencies. The proposed framework consists of a cycle formation link probability model, a procedure for estimating model parameters based on the observed generalized clustering coefficients, and model-based link prediction generation. Using the Enron email dataset and a Facebook dataset we demonstrate that the proposed cycle formation model corresponded closely with the actual link probabilities and the link prediction algorithm based on this model outperformed many existing algorithms.

Keywords: link prediction, graphs and networks, data mining

Suggested Citation

Huang, Zan, Link Prediction Based on Graph Topology: The Predictive Value of Generalized Clustering Coefficient (March 2, 2010). Available at SSRN: https://ssrn.com/abstract=1634014 or http://dx.doi.org/10.2139/ssrn.1634014

Zan Huang (Contact Author)

Pennsylvania State University ( email )

University Park
State College, PA 16802
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
689
Abstract Views
2,511
rank
46,964
PlumX Metrics