A Sequential Data Extractor for Adaptive Tradeoff between Exploration and Exploitation in Reinforcement Learning

Dai, Yinglong; Yi, Zhi; Chen, Ming; Liang, Ying; Zhang, Lianming

doi:10.2139/ssrn.5099963

Download This Paper

Open PDF in Browser

Add Paper to My Library

A Sequential Data Extractor for Adaptive Tradeoff between Exploration and Exploitation in Reinforcement Learning

32 Pages Posted: 16 Jan 2025

See all articles by Yinglong Dai

Ying Liang

Central South University of Forestry and Technology

Extracting effective information from sequential data is a major task in machine learning. Specifically, in Reinforcement Learning (RL), the distribution of sampled sequential data often deviates from that of task-driven data due to agents' excessive exploration. Excessive exploration is particularly pronounced in environments with sparse rewards. Learning optimal policies from the distribution of sampled sequential data, particularly in sparse reward environments, poses a significant challenge. This paper proposes an Extractor for Adaptive Tradeoff Between Exploration and Exploitation (EATBEE) to tackle the challenge. We visually compare the originally sampled data with the task-driven data distribution to clearly illustrate both the degree of deviation between these two datasets and how EATBEE identifies and extracts beneficial data for the agent to achieve the task. The monotonic improvement policy is theoretically validated under the assumption that the EATBEE method ensures a high degree of trajectory similarity before and after improvement. Additionally, EATBEE serves as an independent module that can be seamlessly integrated with most RL algorithms. Furthermore, we substantiate the efficacy and practical applicability of the EATBEE method through experiments conducted in both discrete and continuous environments. EATBEE achieves an effective tradeoff between exploration and exploitation by learning the distribution of task-drive data.

Keywords: Sequential Data, reinforcement learning, Optimal Policies, Adaptive Tradeoff, Exploration and Exploitation.

Suggested Citation: Suggested Citation

Dai, Yinglong and Yi, Zhi and Chen, Ming and Liang, Ying and Zhang, Lianming, A Sequential Data Extractor for Adaptive Tradeoff between Exploration and Exploitation in Reinforcement Learning. Available at SSRN: https://ssrn.com/abstract=5099963 or http://dx.doi.org/10.2139/ssrn.5099963