Link Prediction Based on Graph Topology: The Predictive Value of Generalized Clustering Coefficient
31 Pages Posted: 4 Jul 2010
Date Written: March 2, 2010
Predicting linkages among data objects is a fundamental task for applications in various domains. In many contexts link prediction is entirely based on the linkage information itself. Link-structure based link prediction is closely related to a parallel and almost separate stream of research on topological modeling of large-scale graphs. The well-studied topological measures that summarize the global structure of a graph, such as clustering coefficient, average path length, and degree distribution, have direct implications for link prediction. This paper is an initial effort to explore the connection between link prediction and graph topology and focuses on the predictive value of the clustering coefficient measure. The standard clustering coefficient measure is generalized to capture higher-order clustering tendencies. The proposed framework consists of a cycle formation link probability model, a procedure for estimating model parameters based on the observed generalized clustering coefficients, and model-based link prediction generation. Using the Enron email dataset and a Facebook dataset we demonstrate that the proposed cycle formation model corresponded closely with the actual link probabilities and the link prediction algorithm based on this model outperformed many existing algorithms.
Keywords: link prediction, graphs and networks, data mining
Suggested Citation: Suggested Citation