Adaptive Multi-Model Fusion Learning for Sparse-Reward Reinforcement Learning
13 Pages Posted: 1 Jun 2024
Abstract
In this paper, we consider intrinsic reward generation for sparse-reward reinforcement learning, wherein the agent receives sparse extrinsic rewards from the environment. Conventionally, intrinsic reward generation relies on model prediction errors, where the agent's learning model estimates target values or distributions. The intrinsic reward is crafted as the disparity between the model's prediction output and the actual target, leveraging the tendency that less-visited state-action pairs yield larger prediction errors. We extend this approach to accommodate multiple prediction models, proposing an adaptive fusion technique tailored to the multi-model setting. To streamline the search for the optimal fusion rule, we impose axiomatic conditions that any viable fusion method should meet, and justify these conditions mathematically. Subsequently, we introduce adaptive fusion, which dynamically learns the optimal prediction-error fusion strategy throughout the learning process, thereby enhancing overall learning performance. Our numerical experiments demonstrate the superiority of the proposed intrinsic reward generation method over existing approaches, with performance gains observed across various tasks.
Keywords: Deep reinforcement learning, Neural network, Spare-reward reinforcement learning, Intrinsic reward, Multiple prediction models, Adaptive fusion
Suggested Citation: Suggested Citation