Adaptive Sequential Experiments with Unknown Information Arrival Processes

77 Pages Posted: 26 Jul 2021 Last revised: 13 Jan 2022

See all articles by Yonatan Gur

Yonatan Gur

Stanford Graduate School of Business

Ahmadreza Momeni

Stanford University

Date Written: January 11, 2022


Sequential experiments that are deployed in a broad range of practices are characterized by an exploration-exploitation tradeoff that is well-understood when at each time period feedback is received only on the action that was selected at that period. However, in many practical settings additional data may become available between decision epochs. We study the performance one may achieve when leveraging such auxiliary data, and the design of algorithm that effectively do so without prior knowledge on the information arrival process. Our formulation considers a broad class of distributions that are informative about rewards from actions, and allows auxiliary observations from these distributions to arrive according to an arbitrary and a priori unknown process. When it is known how to map auxiliary data to reward estimates, we characterize the best achievable performance as a function of the information arrival process. In terms of achieving optimal performance, we establish that upper confidence bound and Thompson sampling algorithms possess natural robustness with respect to the information arrival process, which uncovers a novel property of these popular algorithms. When the mappings connecting auxiliary data and rewards are a unknown, we characterize a necessary and sufficient condition under which auxiliary data allows performance improvement, and devise an adaptive policy (termed 2UCBs) that guarantees near optimality. We use a data set from a large media site to analyze the value that may be captured by leveraging auxiliary data for designing content recommendations. Our study highlights the importance of utilizing auxiliary data in the design of sequential experiments, and characterizes how salient features of the auxiliary data stream impact performance. Our study also emphasizes the risk in processing auxiliary information using non-adaptive approaches that are predicated on correct interpretation of this information, as opposed to deploying more flexible, adaptive methods.

Keywords: Sequential experiments, online learning, multi-armed bandits, transfer learning, minimax complexity, adaptive algorithms, product recommendations

JEL Classification: C44, C45, C9

Suggested Citation

Gur, Yonatan and Momeni, Ahmadreza, Adaptive Sequential Experiments with Unknown Information Arrival Processes (January 11, 2022). Stanford University Graduate School of Business Research Paper, Forthcoming, Available at SSRN: or

Yonatan Gur (Contact Author)

Stanford Graduate School of Business ( email )

655 Knight Way
Stanford, CA 94305-5015
United States

Ahmadreza Momeni

Stanford University ( email )

Stanford, CA 94305
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
PlumX Metrics