affiliation not provided to SSRN
reinforcement learning, Q-learning, maximization bias, Q value decomposition, linear function approximation