Elements of Optimal Predictive Modeling Success in Data Science: An Analysis of Survey Data for the ‘Give Me Some Credit’ Competition Hosted on Kaggle
35 Pages Posted: 3 Mar 2013
Date Written: March 2, 2013
In September 2011 a 3 month long credit contest called “Give me some credit” was hosted on the Kaggle predictive modeling platform. This was the most popular Kaggle Crowdsourcing contest to date and had intense competition and drew out the best data scientists and credit scoring practitioners in the world to compete. The contest results supported the hypothesis that credit scoring is a commodity which can’t create sustainable competitive advantage. The survey data also provided insight into predictive modeling skill and the most important factors to optimal predictive modeling related to: optimal model selection (top 3 models used were hybrid models of random forest, support vector machines and gradient boosted machines), proficiency in predictive modeling, effort proxied by number of methods explored, team size (more people resulted in better models in general), and domain knowledge. Thus predictive model performance is a function is: Performance=Chosen Model (exploring multiple models and choosing best algorithm) Effort (Hard work) Predictive Modeling skill domain knowledge team work. The choice of the right kind of algorithm or model dominated performance the most and exploration of different approaches mattered more than backgrounds or experience.
Keywords: data science, predictive modeling, successful factors with optimal models, predictive model success, survey of modelers, crowdsourcing, modeler performance
Suggested Citation: Suggested Citation