Big Data: Pitfalls, Methods and Concepts for an Emergent Field
24 Pages Posted: 8 Mar 2013
Date Written: March 7, 2013
Abstract
Big Data, large-scale aggregate databases of imprints of online and social media activity, has captured scientific and policy attention. However, this emergent field is challenged by inadequate attention to methodological and conceptual issues.
I review key methodological and conceptual challenges including: 1) Inadequate attention to the implicit and explicit structural biases of the platform(s) most frequently used to generate datasets (the model organism problem). 2) The common practice of selecting on the dependent variable without corresponding attention to the complications of this path. 3) Lack of clarity with regard to sampling, universe and representativeness (the denominator problem). 4) Most big data analyses come from a single platform (hence missing the ecology of information flows).
Conceptual issues reviewed in this paper include: 1) More research is needed to interpret aggregated mediated interactions. Clicks, status updates, links, retweets, etc. are complex social interactions. 2) Network methods imported from other fields need to be carefully reconsidered to evaluate appropriateness for analyzing human social media imprints. 3) Most big datasets contain information only on “node-to-node” interaction. However, “field” effects – events that affect a society or a group in a wholesale fashion either through shared experience or through broadcast media – are an important part of human socio-cultural experience. 4.Human reflexivity – that humans will alter behaviors around metrics – needs to be assumed and built into the analysis. 5) Assuming additivity and counting interactions so that each new interaction is seen as (n 1) without regards to the semantics or context can be misleading. 6) The relationship between network structure and other attributes is complex and multi-faceted.
Keywords: big data, social science, Twitter, Facebook, computer science, data science
Suggested Citation: Suggested Citation