Competitive Analysis for Machine Learning and Data Science

74 Pages Posted: 25 Apr 2017 Last revised: 1 Feb 2019

Date Written: January 30, 2019


Statistical machines learn from regularity in data and are often designed for stationary or even independent and identically distributed (IID) processes. However, in most real-world applications it is not known how close the data process is to being IID. Moreover, this is impossible to learn when past data may be misleading or otherwise unrepresentative of the future. An adversarial data process, on the other hand, is not subject to probabilistic constraints; instead an adversary can deterministically attempt to mislead or otherwise confuse the machine. Of course designing for an adversary has its own limitation: that of being pessimistic. Despite the disparity between IIDness and adversarialism, for a given application, it may not be known which will better approximate the data. Fortunately, in many supervised settings, learning from data that is generated by an adaptive adversary is not much harder (statistically) than if it were generated by a static distribution. More precisely, the minimax expected regret values differ by a constant factor. In that case, a machine optimally designed for an adversary is necessarily competitive with any other, even when the data process is IID.

Keywords: incomplete information, online learning, time series, agnostic learning

JEL Classification: C44, C72

Suggested Citation

Spece, Michael, Competitive Analysis for Machine Learning and Data Science (January 30, 2019). Available at SSRN:

Michael Spece (Contact Author)

AllocateRite ( email )

1330 Avenue of the Americas
Suite 600 B
New York, NY 10019
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
PlumX Metrics