A Sequential Data Extractor for Adaptive Tradeoff between Exploration and Exploitation in Reinforcement Learning

32 Pages Posted: 16 Jan 2025

See all articles by Yinglong Dai

Yinglong Dai

Hunan Normal University

Zhi Yi

Hunan Normal University

Ming Chen

Hunan Normal University

Ying Liang

Central South University of Forestry and Technology

Lianming Zhang

Hunan Normal University

Abstract

Extracting effective information from sequential data is a major task in machine learning. Specifically, in Reinforcement Learning (RL), the distribution of sampled sequential data often deviates from that of task-driven data due to agents' excessive exploration. Excessive exploration is particularly pronounced in environments with sparse rewards. Learning optimal policies from the distribution of sampled sequential data, particularly in sparse reward environments, poses a significant challenge. This paper proposes an Extractor for Adaptive Tradeoff Between Exploration and Exploitation (EATBEE) to tackle the challenge. We visually compare the originally sampled data with the task-driven data distribution to clearly illustrate both the degree of deviation between these two datasets and how EATBEE identifies and extracts beneficial data for the agent to achieve the task. The monotonic improvement policy is theoretically validated under the assumption that the EATBEE method ensures a high degree of trajectory similarity before and after improvement. Additionally, EATBEE serves as an independent module that can be seamlessly integrated with most RL algorithms.  Furthermore, we substantiate the efficacy and practical applicability of the EATBEE method through experiments conducted in both discrete and continuous environments. EATBEE achieves an effective tradeoff between exploration and exploitation by learning the distribution of task-drive data.

Keywords: Sequential Data, reinforcement learning, Optimal Policies, Adaptive Tradeoff, Exploration and Exploitation.

Suggested Citation

Dai, Yinglong and Yi, Zhi and Chen, Ming and Liang, Ying and Zhang, Lianming, A Sequential Data Extractor for Adaptive Tradeoff between Exploration and Exploitation in Reinforcement Learning. Available at SSRN: https://ssrn.com/abstract=5099963 or http://dx.doi.org/10.2139/ssrn.5099963

Yinglong Dai

Hunan Normal University ( email )

No. 36, Lushan Road
Yuelu District
Changsha, 410001
China

Zhi Yi

Hunan Normal University ( email )

No. 36, Lushan Road
Yuelu District
Changsha, 410001
China

Ming Chen

Hunan Normal University ( email )

No. 36, Lushan Road
Yuelu District
Changsha, 410001
China

Ying Liang

Central South University of Forestry and Technology ( email )

China

Lianming Zhang (Contact Author)

Hunan Normal University ( email )

No. 36, Lushan Road
Yuelu District
Changsha, 410001
China

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
15
Abstract Views
95
PlumX Metrics