Mastering Visual Reinforcement Learning Via Positive Unlabeled Policy-Guided Contrast
33 Pages Posted: 16 Dec 2024
Abstract
Reinforcement learning has received a significant amount of attention very recently. A fundamental yet challenging problem in this learning paradigm is perceiving high-dimensional environment information, such that visual reinforcement learning emerges, which aims to learn representation from pixel observations for policy optimization. In this article, we profoundly elaborate the frameworks of benchmark methods and demonstrate a long-standing \textit{paradox} challenging current methods: in different training phases, exploring visual semantic information can improve and prevent the performance of the learned feature representations from improving. In practice, we further disclose that the over-redundancy issue generally halts the rise of sample efficiency among baseline methods. To remedy the uncovered deficiency of existing methods, we introduce a novel plug-and-play method for visual reinforcement learning. Our model involves the \textit{positive unlabeled policy-guided contrast} to learn jointly anti-redundant and policy-optimization-relevant pixel semantic information during training. To sufficiently elucidate the proposed method's innate superiority, we revisit the visual reinforcement learning paradigm from the information theory perspective. The theoretical evidence proves that the proposed method can achieve the tighter lower bound of the mutual information between the policy optimization-related information and the information of the representation derived by the encoder. To carry out the evaluation of our model, we conduct extensive benchmark experiments and illustrate the superior performance of our method over existing methods with respect to the pixel observation environments.
Keywords: reinforcement learning, Self-supervised Learning, Contrastive Learning
Suggested Citation: Suggested Citation