SSRN Home Search and Download Papers Browse Abstract and Paper Submission Subscribe to Networks View Briefcase Top Papers Top Authors Top Institutions

 

Abstract

 
 

References (14)

Beta

 


 



Social Network Signatures: A Framework for Re-Identification in Networked Data

Shawndra Hill
University of Pennsylvania

Akash Nagle
University of Pennsylvania


February 11, 2009


Abstract:     
Data on large dynamic social networks, such as telecommunications networks and the Internet, are pervasive. However, representing these networks in a manner that is conducive to efficient large-scale analysis is often a challenge. In this paper, we focus on the analysis task of re-identification. Re-identification in the context of dynamic networks is essentially a matching problem that involves comparing the behavior of networked entities across two time periods. An entity's social network behavior can be represented as a "signature." A similarity score that measures the degree of overlap in signatures can be assigned to pairs of entities observed across specified time periods. The score can then be used as an attribute in a predictive model to classify pairs of entities as matching or non-matching. Prior research has reported success in the domains of e-mail alias detection, author attribution, and identifying fraudulent consumers in the telecommunications industry. In this work, we address the question of "why are we able to re-identify entities on real world dynamic networks?" Our contribution is two-fold. First, we address the challenge of scale with a framework for matching that does not require pair-wise comparisons to ascertain the similarity scores. We assume a random network structure to estimate performance and show that our estimates are good predictors for simulated networks with different characteristics including clustering coefficient, average degree, size, and different network types such as random, small world and scale-free. Second, we show our method is robust against missing links in the second time period but less tolerant to noise, which is modeled by changes in behavior from the first to second time period. Using our framework, we provide a performance estimate for prediction on networks based solely on their degree distribution and dynamics. This work has significant implications for re-identification problems where scale is a challenge as well as when false negatives (e.g., when fraudulent consumers are not labeled as fraudulent) cannot be observed.

Keywords: social networks, network-based re-identification, statistical relational learning

Working Paper Series

Date posted: February 12, 2009 ; Last revised: May 04, 2009

Suggested Citation

Hill, Shawndra and Nagle, Akash, Social Network Signatures: A Framework for Re-Identification in Networked Data (February 11, 2009). Available at SSRN: http://ssrn.com/abstract=1341394


Export to: Export Citation What's this?

Contact Information

Shawndra Hill (Contact Author)
University of Pennsylvania ( email )
3641 Locust Walk
Philadelphia, PA 19104-6365
United States
HOME PAGE: http://www.wharton.upenn.edu/faculty/hill.html
Akash Nagle
University of Pennsylvania ( email )
Philadelphia, PA 19104
United States
Feedback to SSRN (Beta)


Paper statistics
Abstract Views: 187
Downloads: 58
Download Rank: 115,803
References: 14

© 2010 Social Science Electronic Publishing, Inc. All Rights Reserved.  FAQ   Terms of Use   Privacy Policy   Copyright
This page was served by apollo6 in 0.141 seconds.