Big Data: Pitfalls, Methods and Concepts for an Emergent Field

24 Pages Posted: 8 Mar 2013

See all articles by Zeynep Tufekci

Zeynep Tufekci

Princeton University - Center for Information Technology Policy; University of North Carolina (UNC) at Chapel Hill

Date Written: March 7, 2013

Abstract

Big Data, large-scale aggregate databases of imprints of online and social media activity, has captured scientific and policy attention. However, this emergent field is challenged by inadequate attention to methodological and conceptual issues.

I review key methodological and conceptual challenges including: 1) Inadequate attention to the implicit and explicit structural biases of the platform(s) most frequently used to generate datasets (the model organism problem). 2) The common practice of selecting on the dependent variable without corresponding attention to the complications of this path. 3) Lack of clarity with regard to sampling, universe and representativeness (the denominator problem). 4) Most big data analyses come from a single platform (hence missing the ecology of information flows).

Conceptual issues reviewed in this paper include: 1) More research is needed to interpret aggregated mediated interactions. Clicks, status updates, links, retweets, etc. are complex social interactions. 2) Network methods imported from other fields need to be carefully reconsidered to evaluate appropriateness for analyzing human social media imprints. 3) Most big datasets contain information only on “node-to-node” interaction. However, “field” effects – events that affect a society or a group in a wholesale fashion either through shared experience or through broadcast media – are an important part of human socio-cultural experience. 4.Human reflexivity – that humans will alter behaviors around metrics – needs to be assumed and built into the analysis. 5) Assuming additivity and counting interactions so that each new interaction is seen as (n 1) without regards to the semantics or context can be misleading. 6) The relationship between network structure and other attributes is complex and multi-faceted.

Keywords: big data, social science, Twitter, Facebook, computer science, data science

Suggested Citation

Tufekci, Zeynep, Big Data: Pitfalls, Methods and Concepts for an Emergent Field (March 7, 2013). Available at SSRN: https://ssrn.com/abstract=2229952 or http://dx.doi.org/10.2139/ssrn.2229952

Zeynep Tufekci (Contact Author)

Princeton University - Center for Information Technology Policy ( email )

C231A E-Quad
Olden Street
Princeton, NJ 08540
United States

University of North Carolina (UNC) at Chapel Hill ( email )

102 Ridge Road
Chapel Hill, NC NC 27514
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
2,384
Abstract Views
16,708
Rank
12,104
PlumX Metrics