Moving from Data-Constrained to Data-Enabled Research: Experiences and Challenges in Collecting, Validating, and Analyzing Large-Scale E-Commerce Data

Statistical Sciences, Forthcoming

36 Pages Posted: 14 Nov 2006

See all articles by Ravi Bapna

Ravi Bapna

Indian School of Business

Paulo Goes

University of Arizona - Department of Management Information Systems

Ram D. Gopal

University of Connecticut - Department of Operations & Information Management

James R. Marsden

University of Connecticut - Department of Operations & Information Management

Abstract

Widespread e-commerce activity on the Internet has led to new opportunities to collect vast amounts of micro-level market and non-market data. In this paper we share our experiences in collecting, validating, storing and analyzing large Internet based data sets in the area of online auctions, music file sharing and online retailer pricing. We demonstrate how such data can advance knowledge by facilitating sharper and more extensive tests of existing theories and by offering observational underpinnings for the development of new theories. Just as experimental economics pushed the frontiers of economic thought by enabling the testing of numerous theories of economic behavior in the environment of a controlled laboratory, we believe that observing, often over extended periods of time, real-world agents participating in market and non-market activity on the Internet can lead us to develop and test a variety of new theories. Internet data gathering is not controlled experimentation. We cannot randomly assign participants to treatments or determine event orderings. Internet data gathering does offer potentially large data sets with repeated observation of individual choices and action. In addition, the automated data collection holds promise for greatly reduced cost per observation. Our methods rely on technological advances in automated data collection agents. Significant challenges remain in developing appropriate sampling techniques, integrating data from heterogeneous sources in a variety of formats, constructing generalizable processes, and understanding legal constraints. Despite these challenges, the early evidence from those who have harvested and analyzed large amounts of e-commerce data points towards a significant leap in our ability to understand the functioning of electronic commerce.

Keywords: Large scale Internet data, web crawling agents, online auctions, music file sharing

Suggested Citation

Bapna, Ravi and Goes, Paulo and Gopal, Ram D. and Marsden, James R., Moving from Data-Constrained to Data-Enabled Research: Experiences and Challenges in Collecting, Validating, and Analyzing Large-Scale E-Commerce Data. Statistical Sciences, Forthcoming, Available at SSRN: https://ssrn.com/abstract=944745

Ravi Bapna (Contact Author)

Indian School of Business ( email )

Hyderabad, Gachibowli 500 019
India
+91 40 23187156 (Phone)

Paulo Goes

University of Arizona - Department of Management Information Systems ( email )

AZ
United States

Ram D. Gopal

University of Connecticut - Department of Operations & Information Management ( email )

368 Fairfield Road
Storrs, CT 06269-2041
United States

James R. Marsden

University of Connecticut - Department of Operations & Information Management ( email )

368 Fairfield Road
Storrs, CT 06269-2041
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
208
Abstract Views
2,899
Rank
284,673
PlumX Metrics