No.92 Wucheng Rd
Taiyuan, 030006
China
Shanxi University
Off-policy reinforcement learning, Sample efficiency, Exploration-exploitation trade-off, Cognitive consistency
interconnection network, arrangement graph, subnetwork reliability, probabilistic failure
Deep metric learning, embedding ensembles, intra-class diversity