Training Deep Q-Network via Monte Carlo Tree Search for Adaptive Bitrate Control in Video Delivery
40 Pages Posted: 11 Dec 2022
Date Written: December 9, 2022
To maximize the users’ Quality of Experience (QoE), Adaptive Bitrate (ABR) algorithms are designed to automatically adjust the video’s bitrate during its delivery for online playing. The design of a valid ABR algorithm faces three challenges, i.e., delayed consequence of actions, high volatility of bandwidth and diversity of terminal client devices. To solve these challenges, we train a Deep Q Network (DQN) for ABR via a novel simulation-based training policy constructed by Monte Carlo Tree Search (MCTS). We conduct MCTS and apply Common Random Number (CRN) in simulations to accurately estimate values of actions, which serve as the labels for training the DQN. We theoretically prove the consistency of the UCT-type selection policy we adopt and the effectiveness of using CRN in reducing the variance of action value estimation. The experiments on both the simulated non-stationary bandwidth data and real datasets show that our method significantly outperforms the existing reinforcement learning algorithms and the other state-of-the-art ABR methods.
Keywords: video delivery, reinforcement learning, deep Q network, monte carlo tree search, common random number
Suggested Citation: Suggested Citation