Overfitting: Causes and Solutions (Seminar Slides)

24 Pages Posted: 26 Feb 2020 Last revised: 2 Mar 2020

See all articles by Marcos Lopez de Prado

Marcos Lopez de Prado

Cornell University - Operations Research & Industrial Engineering; Abu Dhabi Investment Authority; True Positive Technologies

Date Written: February 26, 2020

Abstract

When used incorrectly, the risk of machine learning (ML) overfitting is extremely high. However, ML counts with sophisticated methods to prevent: (a) train set overfitting, and (b) test set overfitting.

Thus, the popular belief that ML overfits is false. A more accurate statement would be that: (1) in the wrong hands, ML overfits, and (2) in the right hands, ML is more robust to overfitting than classical methods.

When it comes to modelling unstructured data, ML is the only choice. Classical statistics should be taught as a preparation for ML courses, with a focus on overfitting prevention.

Keywords: Machine learning, econometrics, backtest overfitting, selection bias, multiple testing, false discoveries

JEL Classification: G0, G1, G2, G15, G24, E44

Suggested Citation

López de Prado, Marcos and López de Prado, Marcos, Overfitting: Causes and Solutions (Seminar Slides) (February 26, 2020). Available at SSRN: https://ssrn.com/abstract=3544431 or http://dx.doi.org/10.2139/ssrn.3544431

Marcos López de Prado (Contact Author)

Cornell University - Operations Research & Industrial Engineering ( email )

237 Rhodes Hall
Ithaca, NY 14853
United States

HOME PAGE: http://www.orie.cornell.edu

Abu Dhabi Investment Authority ( email )

211 Corniche Road
Abu Dhabi, Abu Dhabi PO Box3600
United Arab Emirates

HOME PAGE: http://www.adia.ae

True Positive Technologies ( email )

NY
United States

HOME PAGE: http://www.truepositive.com

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
3,827
Abstract Views
10,575
Rank
5,289
PlumX Metrics