Deep Execution - Value and Policy Based Reinforcement Learning for Trading and Beating Market Benchmarks
32 Pages Posted: 21 May 2019
Date Written: April 21, 2019
In this article we introduce the term "Deep Execution" that utilize deep reinforcement learning (DRL) for optimal execution. We demonstrate two different approaches to solve for the optimal execution: (1) the deep double Q-network (DDQN), a value-based approach and (2) the proximal policy optimization (PPO) a policy-based approach, for trading and beating market benchmarks, such as the time-weighted average price (TWAP). We show that, firstly, the DRL can reach the theoretically derived optimum by acting on the environment directly. Secondly, the DRL agents can learn to capitalize on price trends (alpha signals) without directly observing the price. Finally, the DRL can take advantage of the available information to create dynamic strategies as an informed trader and thus outperform static benchmark strategies such as the TWAP.
Keywords: Algorithmic Trading, Deep Learning, Execution Algorithms, Reinforcement Learning, Optimal Execution
JEL Classification: C00
Suggested Citation: Suggested Citation