An Introductory Guide to Data Science: The Terminological Landscape
37 Pages Posted: 24 Apr 2017
Date Written: February 20, 2017
The emerging field of data science has rapidly evolved into an extremely diverse field equipped with multi-disciplinary techniques to extract, analyze and classify structured and unstructured data. These methods offer researchers, policy analysts, and the lay public evidence-based insights into a tremendous range of human, organizational, and societal activities on a scale and scope that has rarely been possible with conventional scientific methods. At present, however, the multi-disciplinary nature of the data science space suffers a ‘language’ problem insofar as data scientists from different fields often use different terms to describe common methods and concepts. The aim of the present research is threefold. First, we report results of a literature review that identifies and defines the essential content domain of data science, with special focus on the classification of data collection techniques. Second, we establish a preliminary set of relationships among the most trafficked terms of data science to facilitate interdisciplinary communication among scientists from heterogeneous fields. And third, we develop a classification scheme of web-scraping methods based on their availability, the quality of the data procured by the method, the ease of data extraction, reproducibility, the technical skills required to leverage each method, and the types of data collected by each method.
Keywords: Data science; web scraping; data collection; computational social science
Suggested Citation: Suggested Citation