Data-Pooling in Stochastic Optimization

81 Pages Posted: 13 Jun 2019 Last revised: 30 Mar 2020

See all articles by Vishal Gupta

Vishal Gupta

Data Science and Operations, Marshall School of Business

Nathan Kallus

Cornell University - Cornell Tech NYC

Date Written: May 31, 2019


Managing large-scale systems often involves simultaneously solving thousands of unrelated stochastic optimization problems, each with limited data. Intuition suggests one can decouple these unrelated problems and solve them separately without loss of generality. We propose a novel data-pooling algorithm called Shrunken-SAA that disproves this intuition. In particular, we prove that combining data across problems can outperform decoupling, even when there is no a priori structure linking the problems and data are drawn independently. Our approach does not require strong distributional assumptions and applies to constrained, possibly non-convex, non-smooth optimization problems such as vehicle-routing, economic lot-sizing or facility location. We compare and contrast our results to a similar phenomenon in statistics (Stein's Phenomenon), highlighting unique features that arise in the optimization setting that are not present in estimation. We further prove that as the number of problems grows large, Shrunken-SAA learns if pooling can improve upon decoupling and the optimal amount to pool, even if the average amount of data per problem is fixed and bounded. Importantly, we highlight a simple intuition based on stability that highlights when and why data-pooling offers a benefit, elucidating this perhaps surprising phenomenon. This intuition further suggests that data-pooling offers the most benefits when there are many problems, each of which has a small amount of relevant data. Finally, we demonstrate the practical benefits of data-pooling using real data from a chain of retail drug stores in the context of inventory management.

Keywords: Data-driven optimization, Small-data, large-scale regime, Shrinkage, James-Stein Estimation

JEL Classification: C02

Suggested Citation

Gupta, Vishal and Kallus, Nathan, Data-Pooling in Stochastic Optimization (May 31, 2019). Available at SSRN: or

Vishal Gupta (Contact Author)

Data Science and Operations, Marshall School of Business ( email )

Marshall School of Business
BRI 401, 3670 Trousdale Parkway
Los Angeles, CA 90089
United States


Nathan Kallus

Cornell University - Cornell Tech NYC ( email )

111 8th Avenue #302
New York, NY 10011
United States

Here is the Coronavirus
related research on SSRN

Paper statistics

Abstract Views
PlumX Metrics