Sports Prediction and Betting Models in the Machine Learning Age: The Case of Tennis
43 Pages Posted: 9 Jan 2020 Last revised: 23 Jun 2020
Date Written: June 22, 2020
Machine learning and its numerous variants have meanwhile become established tools in many areas of society. Several attempts have been made to apply machine learning to the prediction of the outcome of professional sports events and to exploit "inefficiencies" in the corresponding betting markets. On the example of tennis, this paper extends previous research by applying a wide range of machine learning techniques to one of the most extensive datasets, covering ten years of male and female professional singles matches. It analyzes two key questions. First, can a variety of machine learning techniques (e.g., random forests) outperform more simple techniques such as logistic regression with regard to predicting the outcome of matches? In this context, what is the informational content of betting market odds and historical match and player data? Second, can the various modeling techniques be used to provide consistent positive returns for bettors? Across all analyzed models, the odds from bookmakers are found to encompass most of the available information to predict the outcomes of matches. Returns from betting strategies over the longer term based on multiple prediction models and using various money management strategies are mainly negative unless one assumes access to the most favorable market quotes. The analysis thus casts certain doubt on those studies that report an achievable “edge” for bettors.
Keywords: Machine Learning, Logistic Regression, Classification, Sports Betting, Tennis
JEL Classification: C10, C53, Z20
Suggested Citation: Suggested Citation