Deep Bellman Hedging

18 Pages Posted: 13 Jul 2022 Last revised: 2 Jan 2023

See all articles by Hans Buehler

Hans Buehler

XTX Markets

Murray Phillip

J.P. Morgan Chase & Co.

Ben Wood

JP Morgan Chase

Date Written: June 30, 2022


We present an actor-critic-type reinforcement learning algorithm for solving the problem of hedging a portfolio of financial instruments such as securities and over-the-counter derivatives using purely historic data. The key characteristics of our approach are: he ability to hedge with derivatives such as forwards, swaps, futures, options; incorporation of trading frictions such as trading cost and liquidity constraints; applicability for any reasonable portfolio of financial instruments; realistic, continuous state and action spaces; and formal risk-adjusted return objectives.

Most importantly, the trained model provides an optimal hedge for arbitrary initial portfolios and market states without the need for re-training.

We also prove existence of finite solutions to our Bellman equation, and show the relation to our vanilla Deep Hedging approach

Keywords: Deep Hedging, Reinforcement Learning, Convex Risk Measures, Hedging

Suggested Citation

Buehler, Hans and Phillip, Murray and Wood, Ben, Deep Bellman Hedging (June 30, 2022). Available at SSRN: or

Hans Buehler (Contact Author)

XTX Markets ( email )

14-18 Handyside Street
London, Greater London N1C 4DN
United Kingdom


Murray Phillip

J.P. Morgan Chase & Co. ( email )

60 Wall St.
New York, NY 10260
United States

Ben Wood

JP Morgan Chase ( email )

United Kingdom

Do you have negative results from your research you’d like to share?

Paper statistics

Abstract Views
PlumX Metrics