Data Privacy of Online Reviews

Posted: 6 May 2020

Date Written: January 20, 2020


Online reviews are an important source of information on products and services for consumers and firms. Although incentivizing high-quality reviews is an important business objective for any review platform, we show that it is also possible to identify anonymous reviewers by exploiting the characteristics of posted reviews. Using data from major review platforms and our two-stage de-anonymization methodology, we demonstrate that the ability to identify an author is determined primarily by the amount and granularity of structured data (e.g., location, first name) posted with the review and secondarily by the author’s writing style across reviews. When the number of potential authors with identical structured data ranges from 100 to 5 and sufficient training data exists for text analysis, the average probabilities of identification range from 40 to 81%. Our findings suggest that review platforms concerned with the potential negative effects of privacy-related incidents should limit or aggregate their reviewers’ structured data when it is adjoined with textual content or mentioned in the text itself. We also show that although protection policies that focus on structured data remove the most predictive elements of authorship, they also have a small negative effect on the usefulness of reviews.

Keywords: data privacy, online reviews, de-anonymization, writing style, structured data

Suggested Citation

Schneider, Matthew and Mankad, Shawn, Data Privacy of Online Reviews (January 20, 2020). Available at SSRN: or

Shawn Mankad

Cornell University ( email )

Ithaca, NY 14853
United States
6072559594 (Phone)

Here is the Coronavirus
related research on SSRN

Paper statistics

Abstract Views
PlumX Metrics