Elements of Optimal Predictive Modeling Success in Data Science: An Analysis of Survey Data for the ‘Give Me Some Credit’ Competition Hosted on Kaggle

35 Pages Posted: 3 Mar 2013

Date Written: March 2, 2013

Abstract

In September 2011 a 3 month long credit contest called “Give me some credit” was hosted on the Kaggle predictive modeling platform. This was the most popular Kaggle Crowdsourcing contest to date and had intense competition and drew out the best data scientists and credit scoring practitioners in the world to compete. The contest results supported the hypothesis that credit scoring is a commodity which can’t create sustainable competitive advantage. The survey data also provided insight into predictive modeling skill and the most important factors to optimal predictive modeling related to: optimal model selection (top 3 models used were hybrid models of random forest, support vector machines and gradient boosted machines), proficiency in predictive modeling, effort proxied by number of methods explored, team size (more people resulted in better models in general), and domain knowledge. Thus predictive model performance is a function is: Performance=Chosen Model (exploring multiple models and choosing best algorithm) Effort (Hard work) Predictive Modeling skill domain knowledge team work. The choice of the right kind of algorithm or model dominated performance the most and exploration of different approaches mattered more than backgrounds or experience.

Keywords: data science, predictive modeling, successful factors with optimal models, predictive model success, survey of modelers, crowdsourcing, modeler performance

Suggested Citation

Sharma, Dhruv, Elements of Optimal Predictive Modeling Success in Data Science: An Analysis of Survey Data for the ‘Give Me Some Credit’ Competition Hosted on Kaggle (March 2, 2013). Available at SSRN: https://ssrn.com/abstract=2227333 or http://dx.doi.org/10.2139/ssrn.2227333

Dhruv Sharma (Contact Author)

Independent ( email )

2023 N. Cleveland St.
Arlington, VA 22201
United States

HOME PAGE: http://theinterdisciplinarian.com/

Register to save articles to
your library

Register

Paper statistics

Downloads
1,880
rank
7,780
Abstract Views
5,111
PlumX Metrics