A New Methodology for the Study of Dynamic Decision-Making with a Continuous Unknown Parameter
70 Pages Posted: 1 Sep 2020
Date Written: August 21, 2020
A new methodology is presented to solve an important model of dynamic decision-making with a continuous unknown parameter (or state). The methodology centers on the concepts of “continuation-value function” (which gives the expected value-to-go from every possible state under a feasible policy) and “efficient frontier” of such functions in each period. When the model primitives can be described through a family of basis functions, e.g. polynomials, a continuation-value function retains that property and can be fully represented by a basis weight vector. The efficient frontiers of the weight vectors can be constructed through backward induction, which leads to an essential reduction of problem complexity and enables an exact solution for small-sized problems. A set of approximation methods based on the new methodology are developed to tackle larger problems. The methodology is also extended to the multi-dimensional (multi-parameter) setting, which features the important problem of contextual multi-armed bandits with linear expected rewards. We demonstrate that our approximation algorithm for that problem has a clear edge over three benchmark algorithms in the challenging learning environment with many actions and relatively short horizons.
Keywords: dynamic programming, information learning, dynamic pricing with learning, multi-armed bandits, partially observable Markov decision processes
JEL Classification: C11, C44, C61, D83
Suggested Citation: Suggested Citation