Optimal Ancillary Service Disaggregation for Ev Charging Station Aggregators: A Hybrid On–Off Policy Reinforcement Learning Framework
24 Pages Posted: 17 Jan 2025
There are 2 versions of this paper
Optimal Ancillary Service Disaggregation for Ev Charging Station Aggregators: A Hybrid On–Off Policy Reinforcement Learning Framework
Optimal Ancillary Service Disaggregation for Ev Charging Station Aggregators: A Hybrid On–Off Policy Reinforcement Learning Framework
Abstract
With the escalating adoption of electric vehicles (EVs), EV charging stations (EVCSs) can be aggregated into an EVCS aggregator (EVCSA) to provide ancillary services to the power system. Once aggregated, the EVCSA needs to further optimally disaggregate the regulation commands from the power system operator (PSO) to individual EVCSs during dispatch. This process faces two main challenges: 1) The large number and diversity of EVCSs complicate the optimization problem; 2) The inherent uncertainty in EVCS adjustable capacity causes discrepancies in the disaggregation scheme. This paper introduces a novel hybrid on-off policy reinforcement learning (RL) framework to optimize ancillary service disaggregation in complex and uncertain environments. Traditional RL strategies either follow current policies too rigidly (i.e., on-policy methods) or rely on high-quality samples (i.e., off-policy methods), limiting exploration efficiency in the diverse and complex EVCS environment. To address this, we propose a hybrid on-off policy exploration strategy that combines the high-quality sampling of on-policy methods with the efficient learning of off-policy methods, thus improving the exploration efficiency in the face of a complex optimization problem. To tackle the second challenge, an integrated soft actor-critic (ISAC) RL algorithm with a physics-informed long short-term memory (PILSTM) prediction model is introduced. Economic laws are incorporated into the loss function of this model to mine the uncertian time-series adjustable capacity data, enhancing model interpretability. Additionally, by integrating SAC’s dual-Q networks with its policy network, the ISAC algorithm accelerates convergence under uncertainty. Simulation results demonstrate that the proposed framework outperforms existing model-free RL methods and model-based optimization methods. The scalability of the approach is further validated through comparative analyses.
Keywords: Ancillary service disaggregation, electric vehicle charging station aggregator, on-off policy exploration, integrated soft actor-critic, physics-informed long short-term memory
Suggested Citation: Suggested Citation