Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses

30 Pages Posted: 8 Apr 2019 Last revised: 11 May 2019

See all articles by Rong Jin

Rong Jin

Alibaba Group

David Simchi-Levi

Massachusetts Institute of Technology (MIT) - School of Engineering

Li Wang

Massachusetts Institute of Technology (MIT) - Operations Research Center

Xinshang Wang

Alibaba Group

Sen Yang

Alibaba Group

Date Written: February 26, 2019

Abstract

The recent rising popularity of ultra-fast delivery services on retail platforms fuels the increasing use of urban warehouses, whose proximity to customers makes fast deliveries viable. The space limit in urban warehouses poses a problem for the online retailers: the number of products (SKUs) they carry is no longer "the more, the better", yet it can still be significantly large, reaching hundreds or thousands in a product category. In this paper, we study algorithms for dynamically selecting a large number of products (i.e., SKUs) with top customer purchase probabilities on the fly, from an ocean of potential products to offer on retailers' ultra-fast delivery platforms.

We distill the product selection problem into a semi-bandit model with linear generalization. There are in total N different arms, each with a feature vector of dimension d. The player pulls K arms in each period and observes the bandit feedback from each of the pulled arms. We focus on the setting where K is much greater than the number of total time periods T or the dimension of product features d. We first analyze a standard UCB algorithm and show its regret bound can be expressed as the sum of a T-independent part Õ(Kd3/2) and a T-dependent part Õ(d √(KT)), which we refer to as "fixed cost" and "variable cost" respectively. To reduce the fixed cost for large K values, we propose a novel online learning algorithm, which iteratively shrinks the upper confidence bounds within each period, and show its fixed cost is reduced by a factor of d to Õ(K √(d)). Moreover, we test the algorithms on an industrial dataset from Alibaba Group. Experimental results show that our new algorithm reduces the total regret of the standard UCB algorithm by at least 10%.

Keywords: sequential decision making, adaptive product selection, online learning, online retailing, stochastic optimization, regret analysis

Suggested Citation

Jin, Rong and Simchi-Levi, David and Wang, Li and Wang, Xinshang and Yang, Sen, Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses (February 26, 2019). Available at SSRN: https://ssrn.com/abstract=3342761 or http://dx.doi.org/10.2139/ssrn.3342761

Rong Jin

Alibaba Group ( email )

David Simchi-Levi

Massachusetts Institute of Technology (MIT) - School of Engineering ( email )

MA
United States

Li Wang (Contact Author)

Massachusetts Institute of Technology (MIT) - Operations Research Center ( email )

77 Massachusetts Avenue
Bldg. E40-103
Cambridge, MA 02139
United States

Xinshang Wang

Alibaba Group ( email )

Sen Yang

Alibaba Group ( email )

Register to save articles to
your library

Register

Paper statistics

Downloads
42
Abstract Views
296
PlumX Metrics