Dynamic Marketing Policies: Constructing Markov States for Reinforcement Learning

37 Pages Posted: 16 Jul 2020

See all articles by Yuting Zhu

Yuting Zhu

Massachusetts Institute of Technology (MIT)

Duncan Simester

Massachusetts Institute of Technology (MIT) - Sloan School of Management

Jonathan A. Parker

Massachusetts Institute of Technology (MIT) - Sloan School of Management; National Bureau of Economic Research (NBER)

Antoinette Schoar

Massachusetts Institute of Technology (MIT) - Sloan School of Management; National Bureau of Economic Research (NBER)

Date Written: May 2020

Abstract

Many firms want to target their customers with a sequence of marketing actions, rather than just a single action. We interpret sequential targeting problems as a Markov Decision Process (MDP), which can be solved using a range of Reinforcement Learning (RL) algorithms. MDPs require the construction of Markov state spaces. These state spaces summarize the current information about each customer in each time period, so that movements over time between Markov states describe customers’ dynamic paths. The Markov property requires that the states are “memoryless,” so that future outcomes depend only upon the current state, not upon earlier states.

We propose a method for constructing Markov states from historical transaction data by adapting a method that has been proposed in the computer science literature. Rather than designing states in transaction space, we construct predictions over how customers will respond to a firm’s marketing actions. We then design states using these predictions, grouping customers together if their predicted behavior is similar. To make this approach computationally tractable, we adapt the method to exploit a common feature of transaction data (sparsity). As a result, a problem that faces computational challenges in many settings, becomes more feasible in a marketing setting. The method is straightforward to implement, and the resulting states can be used in standard RL algorithms. We evaluate the method using a novel validation approach. The findings confirm that the constructed states satisfy the Markov property, and are robust to the introduction of non-Markov distortions in the data.

Suggested Citation

Zhu, Yuting and Simester, Duncan and Parker, Jonathan A. and Schoar, Antoinette, Dynamic Marketing Policies: Constructing Markov States for Reinforcement Learning (May 2020). Available at SSRN: https://ssrn.com/abstract=3633870 or http://dx.doi.org/10.2139/ssrn.3633870

Yuting Zhu (Contact Author)

Massachusetts Institute of Technology (MIT) ( email )

77 Massachusetts Avenue
50 Memorial Drive
Cambridge, MA 02139-4307
United States

Duncan Simester

Massachusetts Institute of Technology (MIT) - Sloan School of Management ( email )

Management Science
Cambridge, MA 02142
United States
617-258-0679 (Phone)
617-258-7597 (Fax)

Jonathan A. Parker

Massachusetts Institute of Technology (MIT) - Sloan School of Management ( email )

100 Main Street
E62-416
Cambridge, MA
United States
617-253-7218 (Phone)

National Bureau of Economic Research (NBER)

1050 Massachusetts Avenue
Cambridge, MA 02138
United States

Antoinette Schoar

Massachusetts Institute of Technology (MIT) - Sloan School of Management ( email )

50 Memorial Drive, E52-447
Cambridge, MA 02142
United States
617-253-3763 (Phone)
617-258-6855 (Fax)

National Bureau of Economic Research (NBER) ( email )

1050 Massachusetts Avenue
Cambridge, MA 02138
United States

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
14
Abstract Views
91
PlumX Metrics