A Sequential Data Extractor for Adaptive Tradeoff between Exploration and Exploitation in Reinforcement Learning
32 Pages Posted: 16 Jan 2025
Abstract
Extracting effective information from sequential data is a major task in machine learning. Specifically, in Reinforcement Learning (RL), the distribution of sampled sequential data often deviates from that of task-driven data due to agents' excessive exploration. Excessive exploration is particularly pronounced in environments with sparse rewards. Learning optimal policies from the distribution of sampled sequential data, particularly in sparse reward environments, poses a significant challenge. This paper proposes an Extractor for Adaptive Tradeoff Between Exploration and Exploitation (EATBEE) to tackle the challenge. We visually compare the originally sampled data with the task-driven data distribution to clearly illustrate both the degree of deviation between these two datasets and how EATBEE identifies and extracts beneficial data for the agent to achieve the task. The monotonic improvement policy is theoretically validated under the assumption that the EATBEE method ensures a high degree of trajectory similarity before and after improvement. Additionally, EATBEE serves as an independent module that can be seamlessly integrated with most RL algorithms. Furthermore, we substantiate the efficacy and practical applicability of the EATBEE method through experiments conducted in both discrete and continuous environments. EATBEE achieves an effective tradeoff between exploration and exploitation by learning the distribution of task-drive data.
Keywords: Sequential Data, reinforcement learning, Optimal Policies, Adaptive Tradeoff, Exploration and Exploitation.
Suggested Citation: Suggested Citation