|
||||
|
||||
Social Network Signatures: A Framework for Re-Identification in Networked Data
Shawndra Hill University of Pennsylvania Akash Nagle University of Pennsylvania February 11, 2009 Abstract: Data on large dynamic social networks, such as telecommunications networks and the Internet, are pervasive. However, representing these networks in a manner that is conducive to efficient large-scale analysis is often a challenge. In this paper, we focus on the analysis task of re-identification. Re-identification in the context of dynamic networks is essentially a matching problem that involves comparing the behavior of networked entities across two time periods. An entity's social network behavior can be represented as a "signature." A similarity score that measures the degree of overlap in signatures can be assigned to pairs of entities observed across specified time periods. The score can then be used as an attribute in a predictive model to classify pairs of entities as matching or non-matching. Prior research has reported success in the domains of e-mail alias detection, author attribution, and identifying fraudulent consumers in the telecommunications industry. In this work, we address the question of "why are we able to re-identify entities on real world dynamic networks?" Our contribution is two-fold. First, we address the challenge of scale with a framework for matching that does not require pair-wise comparisons to ascertain the similarity scores. We assume a random network structure to estimate performance and show that our estimates are good predictors for simulated networks with different characteristics including clustering coefficient, average degree, size, and different network types such as random, small world and scale-free. Second, we show our method is robust against missing links in the second time period but less tolerant to noise, which is modeled by changes in behavior from the first to second time period. Using our framework, we provide a performance estimate for prediction on networks based solely on their degree distribution and dynamics. This work has significant implications for re-identification problems where scale is a challenge as well as when false negatives (e.g., when fraudulent consumers are not labeled as fraudulent) cannot be observed.
Keywords: social networks, network-based re-identification, statistical relational learning Working Paper SeriesDate posted: February 12, 2009 ; Last revised: May 04, 2009Suggested CitationContact Information
|
|
||||||||||||||
© 2010 Social Science Electronic Publishing, Inc. All Rights Reserved.
FAQ
Terms of Use
Privacy Policy
Copyright
This page was served by apollo6 in 0.141 seconds.