Competitive Analysis for Machine Learning and Data Science
74 Pages Posted: 25 Apr 2017 Last revised: 1 Feb 2019
Date Written: January 30, 2019
Abstract
Statistical machines learn from regularity in data and are often designed for stationary or even independent and identically distributed (IID) processes. However, in most real-world applications it is not known how close the data process is to being IID. Moreover, this is impossible to learn when past data may be misleading or otherwise unrepresentative of the future. An adversarial data process, on the other hand, is not subject to probabilistic constraints; instead an adversary can deterministically attempt to mislead or otherwise confuse the machine. Of course designing for an adversary has its own limitation: that of being pessimistic. Despite the disparity between IIDness and adversarialism, for a given application, it may not be known which will better approximate the data. Fortunately, in many supervised settings, learning from data that is generated by an adaptive adversary is not much harder (statistically) than if it were generated by a static distribution. More precisely, the minimax expected regret values differ by a constant factor. In that case, a machine optimally designed for an adversary is necessarily competitive with any other, even when the data process is IID.
Keywords: incomplete information, online learning, time series, agnostic learning
JEL Classification: C44, C72
Suggested Citation: Suggested Citation