When Should we Expect Non-Decreasing Returns from Data in Prediction Tasks?

26 Pages Posted: 5 Mar 2025

See all articles by Maximilian Schaefer

Maximilian Schaefer

Institut Mines-Télécom Business School

Date Written: March 05, 2025

Abstract

This article studies the change in the prediction accuracy of a response variable when the number of predictors increases, and all variables follow a multivariate normal distribution. Assuming that the correlations between variables are independently drawn, I show that adding variables leads to globally increasing returns to scale when the mean of the correlation distribution is zero. The speed of learning depends positively on the variance of the correlation distribution. I use simulations to study the more complex case of correlation distributions with a non-zero mean and find a pattern of decreasing returns followed by increasing returns to scale - as long as the variance of correlations is not degenerate, in which case globally decreasing returns emerge. I train a collaborative filtering algorithm using the MovieLens 1M dataset to analyze returns from adding variables in a more realistic setting and find globally increasing returns to scale across 2, 000 variables. The results suggest significant scale advantages from additional variables in prediction tasks.

Keywords: C53, C55, D83, L86 Collaborative Filtering, Data as Barrier to Entry, Learning from Data, Increasing Returns to Scale

Suggested Citation

Schaefer, Maximilian, When Should we Expect Non-Decreasing Returns from Data in Prediction Tasks? (March 05, 2025). Available at SSRN: https://ssrn.com/abstract=5166627 or http://dx.doi.org/10.2139/ssrn.5166627

Maximilian Schaefer (Contact Author)

Institut Mines-Télécom Business School ( email )

9 rue Charles Fourier
Evry, 91011
France

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
10
Abstract Views
82
PlumX Metrics