Mastering Visual Reinforcement Learning Via Positive Unlabeled Policy-Guided Contrast

33 Pages Posted: 16 Dec 2024

See all articles by Zehua Zang

Zehua Zang

affiliation not provided to SSRN

Qirui Ji

affiliation not provided to SSRN

Kai Li

affiliation not provided to SSRN

Rui Wang

affiliation not provided to SSRN

Lixiang Liu

affiliation not provided to SSRN

Fuchun Sun

Tsinghua University

Abstract

Reinforcement learning has received a significant amount of attention very recently. A fundamental yet challenging problem in this learning paradigm is perceiving high-dimensional environment information, such that visual reinforcement learning emerges, which aims to learn representation from pixel observations for policy optimization. In this article, we profoundly elaborate the frameworks of benchmark methods and demonstrate  a long-standing \textit{paradox} challenging current methods: in different training phases, exploring visual semantic information can improve and prevent the performance of the learned feature representations from improving. In practice, we further disclose that the over-redundancy issue generally halts the rise of sample efficiency among baseline methods. To remedy the uncovered deficiency of existing methods, we introduce a novel plug-and-play method for visual reinforcement learning. Our model involves the \textit{positive unlabeled policy-guided contrast} to learn jointly anti-redundant and policy-optimization-relevant pixel semantic information during training. To sufficiently elucidate the proposed method's innate superiority, we revisit the visual reinforcement learning paradigm from the information theory perspective. The theoretical evidence proves that the proposed method can achieve the tighter lower bound of the mutual information between the policy optimization-related information and the information of the representation derived by the encoder. To carry out the evaluation of our model, we conduct extensive benchmark experiments and illustrate the superior performance of our method over existing methods with respect to the pixel observation environments.

Keywords: reinforcement learning, Self-supervised Learning, Contrastive Learning

Suggested Citation

Zang, Zehua and Ji, Qirui and Li, Kai and Wang, Rui and Liu, Lixiang and Sun, Fuchun, Mastering Visual Reinforcement Learning Via Positive Unlabeled Policy-Guided Contrast. Available at SSRN: https://ssrn.com/abstract=5059981 or http://dx.doi.org/10.2139/ssrn.5059981

Zehua Zang

affiliation not provided to SSRN ( email )

Qirui Ji

affiliation not provided to SSRN ( email )

Kai Li

affiliation not provided to SSRN ( email )

Rui Wang (Contact Author)

affiliation not provided to SSRN ( email )

Lixiang Liu

affiliation not provided to SSRN ( email )

Fuchun Sun

Tsinghua University ( email )

Beijing, 100084
China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
108
Abstract Views
344
Rank
553,149
PlumX Metrics