Big Data: Pitfalls, Methods and Concepts for an Emergent Field

Zeynep Tufekci

Princeton University - Center for Information Technology Policy; University of North Carolina (UNC) at Chapel Hill

March 7, 2013

Big Data, large-scale aggregate databases of imprints of online and social media activity, has captured scientific and policy attention. However, this emergent field is challenged by inadequate attention to methodological and conceptual issues.

I review key methodological and conceptual challenges including: 1) Inadequate attention to the implicit and explicit structural biases of the platform(s) most frequently used to generate datasets (the model organism problem). 2) The common practice of selecting on the dependent variable without corresponding attention to the complications of this path. 3) Lack of clarity with regard to sampling, universe and representativeness (the denominator problem). 4) Most big data analyses come from a single platform (hence missing the ecology of information flows).

Conceptual issues reviewed in this paper include: 1) More research is needed to interpret aggregated mediated interactions. Clicks, status updates, links, retweets, etc. are complex social interactions. 2) Network methods imported from other fields need to be carefully reconsidered to evaluate appropriateness for analyzing human social media imprints. 3) Most big datasets contain information only on “node-to-node” interaction. However, “field” effects – events that affect a society or a group in a wholesale fashion either through shared experience or through broadcast media – are an important part of human socio-cultural experience. 4.Human reflexivity – that humans will alter behaviors around metrics – needs to be assumed and built into the analysis. 5) Assuming additivity and counting interactions so that each new interaction is seen as (n 1) without regards to the semantics or context can be misleading. 6) The relationship between network structure and other attributes is complex and multi-faceted.

Number of Pages in PDF File: 24

Keywords: big data, social science, Twitter, Facebook, computer science, data science

Open PDF in Browser Download This Paper

Date posted: March 8, 2013  

Suggested Citation

Tufekci, Zeynep, Big Data: Pitfalls, Methods and Concepts for an Emergent Field (March 7, 2013). Available at SSRN: https://ssrn.com/abstract=2229952 or http://dx.doi.org/10.2139/ssrn.2229952

Contact Information

Zeynep Tufekci (Contact Author)
Princeton University - Center for Information Technology Policy ( email )
C231A E-Quad
Olden St.
Princeton, NJ 08540
United States
University of North Carolina (UNC) at Chapel Hill ( email )
CB #3265
Chapel Hill, NC NC 27599
United States
Feedback to SSRN

Paper statistics
Abstract Views: 9,980
Downloads: 1,724
Download Rank: 6,418