Learning in Combinatorial Optimization: What and How to Explore

Sajad Modaresi, Denis Sauré, Juan Pablo Vielma (2020) Learning in Combinatorial Optimization: What and How to Explore. Operations Research 68 (5) 1585-1604

Posted: 26 Sep 2017 Last revised: 15 Apr 2024

See all articles by Sajad Modaresi

Sajad Modaresi

University of North Carolina (UNC) at Chapel Hill - Kenan-Flagler Business School

Denis Saure

University of Chile - Industrial Engineering

Juan Pablo Vielma

Massachusetts Institute of Technology (MIT) - Sloan School of Management

Date Written: March 15, 2019

Abstract

We study dynamic decision-making under uncertainty when, at each period, a decision-maker implements a solution to a combinatorial optimization problem. The objective coefficient vectors of said problem, which are unobserved prior to implementation, vary from period to period. These vectors, however, are known to be random draws from an initially unknown distribution with known range. By implementing different solutions, the decision-maker extracts information about the underlying distribution, but at the same time experiences the cost associated with said solutions. We show that resolving the implied exploration versus exploitation trade-off efficiently is related to solving a Lower Bound Problem (LBP), which simultaneously answers the questions of what to explore and how to do so. We establish a fundamental limit on the asymptotic performance of any admissible policy that is proportional to the optimal objective value of the LBP problem. We show that such a lower bound might be asymptotically attained by policies that adaptively reconstruct and solve LBP at an exponentially decreasing frequency. Because LBP is likely intractable in practice, we propose policies that instead reconstruct and solve a proxy for LBP, which we call the Optimality Cover Problem (OCP). We provide strong evidence of the practical tractability of OCP which implies that the proposed policies can be implemented in real-time. We test the performance of the proposed policies through extensive numerical experiments and show that they significantly outperform relevant benchmarks in the long-term and are competitive in the short-term.

Keywords: Combinatorial Optimization, Multi-Armed Bandit, Mixed-Integer Programming

Suggested Citation

Modaresi, Sajad and Saure, Denis and Vielma, Juan Pablo, Learning in Combinatorial Optimization: What and How to Explore (March 15, 2019). Sajad Modaresi, Denis Sauré, Juan Pablo Vielma (2020) Learning in Combinatorial Optimization: What and How to Explore. Operations Research 68 (5) 1585-1604, Available at SSRN: https://ssrn.com/abstract=3041893 or http://dx.doi.org/10.2139/ssrn.3041893

Sajad Modaresi (Contact Author)

University of North Carolina (UNC) at Chapel Hill - Kenan-Flagler Business School ( email )

McColl Building
Chapel Hill, NC 27599-3490
United States

Denis Saure

University of Chile - Industrial Engineering ( email )

República 701, Santiago
Chile

Juan Pablo Vielma

Massachusetts Institute of Technology (MIT) - Sloan School of Management ( email )

77 Massachusetts Ave.
E62-561
Cambridge, MA 02142
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
1,790
PlumX Metrics