Obtaining Data from the Internet: A Guide to Data Crawling in Management Research

38 Pages Posted: 20 Jun 2019

See all articles by Jörg Claussen

Jörg Claussen

Ludwig Maximilian University of Munich (LMU) - Faculty of Business Administration (Munich School of Management); Copenhagen Business School - Department of Innovation and Organizational Economics

Christian Peukert

University of Lausanne (HEC)

Date Written: June 2019

Abstract

The increasing availability of data on the Internet opens new opportunities for management research and the method of data crawling can be used for automated large-scale data extraction. We show that data crawling has quickly gained popularity and is used for a wide variety of purposes, but has so far gained less traction in the field of management. We argue that we could use many data sets used in other disciplines for answering questions in management research and show that setting up a data crawler does not require advanced programming skills. However, a lot of pitfalls can challenge the success of using crawled data for research. We develop a guideline for crawling projects and address how many of the regularly occurring challenges can be addressed.

Keywords: Crawler, Spider, Scrape, Bot, Data

JEL Classification: M1, C81

Suggested Citation

Claussen, Jörg and Peukert, Christian, Obtaining Data from the Internet: A Guide to Data Crawling in Management Research (June 2019). Available at SSRN: https://ssrn.com/abstract=3403799 or http://dx.doi.org/10.2139/ssrn.3403799

Jörg Claussen (Contact Author)

Ludwig Maximilian University of Munich (LMU) - Faculty of Business Administration (Munich School of Management) ( email )

Kaulbachstr. 45
Munich, DE 80539
Germany

Copenhagen Business School - Department of Innovation and Organizational Economics ( email )

Kilevej 14A
Frederiksberg, 2000
Denmark

Christian Peukert

University of Lausanne (HEC) ( email )

Unil Dorigny, Batiment Internef
Lausanne, 1015
Switzerland

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
390
Abstract Views
1,730
rank
92,584
PlumX Metrics