Mastering Visual Reinforcement Learning Via Positive Unlabeled Policy-Guided Contrast

Zang, Zehua; Ji, Qirui; Li, Kai; Wang, Rui; Liu, Lixiang; Sun, Fuchun

doi:10.2139/ssrn.5059981

Download This Paper

Open PDF in Browser

Add Paper to My Library

Mastering Visual Reinforcement Learning Via Positive Unlabeled Policy-Guided Contrast

33 Pages Posted: 16 Dec 2024

See all articles by Zehua Zang

Reinforcement learning has received a significant amount of attention very recently. A fundamental yet challenging problem in this learning paradigm is perceiving high-dimensional environment information, such that visual reinforcement learning emerges, which aims to learn representation from pixel observations for policy optimization. In this article, we profoundly elaborate the frameworks of benchmark methods and demonstrate a long-standing \textit{paradox} challenging current methods: in different training phases, exploring visual semantic information can improve and prevent the performance of the learned feature representations from improving. In practice, we further disclose that the over-redundancy issue generally halts the rise of sample efficiency among baseline methods. To remedy the uncovered deficiency of existing methods, we introduce a novel plug-and-play method for visual reinforcement learning. Our model involves the \textit{positive unlabeled policy-guided contrast} to learn jointly anti-redundant and policy-optimization-relevant pixel semantic information during training. To sufficiently elucidate the proposed method's innate superiority, we revisit the visual reinforcement learning paradigm from the information theory perspective. The theoretical evidence proves that the proposed method can achieve the tighter lower bound of the mutual information between the policy optimization-related information and the information of the representation derived by the encoder. To carry out the evaluation of our model, we conduct extensive benchmark experiments and illustrate the superior performance of our method over existing methods with respect to the pixel observation environments.

Keywords: reinforcement learning, Self-supervised Learning, Contrastive Learning

Suggested Citation: Suggested Citation

Zang, Zehua and Ji, Qirui and Li, Kai and Wang, Rui and Liu, Lixiang and Sun, Fuchun, Mastering Visual Reinforcement Learning Via Positive Unlabeled Policy-Guided Contrast. Available at SSRN: https://ssrn.com/abstract=5059981 or http://dx.doi.org/10.2139/ssrn.5059981