Changsha Hunan, 410073
China
National University of Defense Technology
opponent hidden information inference, state estimation, stable feature, action model, Texas Hold'em
Overestimation reductionMulti-agent Operator switchingValue averagingReinforcement Learning.
Multi-agent InterpretabilityRisk-sensitive Reinforcement Learning Cooperative Policy.
Multi-agent \sep Adaptive risk attitudes \sep Distributional \sep Reinforcement learning \sep Risk-sensitive.