Multiagent Learning for Black Box System Reward Functions

Advances in Complex Systems, Vol. 12, Nos. 4-5, pp. 475-492 2009

Posted: 26 Apr 2010

Date Written: August 1, 2009

Abstract

In large, distributed systems composed of adaptive and interactive components (agents), ensuring the coordination among the agents so that the system achieves certain performance objectives is a challenging proposition. The key difficulty to overcome in such systems is one of credit assignment: How to apportion credit (or blame) to a particular agent based on the performance of the entire system. In this paper, we show how this problem can be solved in general for a large class of reward functions whose analytical form may be unknown (hence "black box" reward). This method combines the salient features of global solutions (e.g. "team games") which are broadly applicable but provide poor solutions in large problems with those of local solutions (e.g. "difference rewards") which learn quickly, but can be computationally burdensome. We introduce two estimates for local rewards for a class of problems where the mapping from the agent actions to system reward functions can be decomposed into a linear combination of nonlinear functions of the agents' actions. We test our method's performance on a distributed marketing problem and an air traffic flow management problem and show a 44% performance improvement over team games and a speedup of order n for difference rewards (for an n agent system).

Keywords: Multiagent learning, black box reward functions, multiagent coordination

Suggested Citation

Tumer, Kagan, Multiagent Learning for Black Box System Reward Functions (August 1, 2009). Advances in Complex Systems, Vol. 12, Nos. 4-5, pp. 475-492 2009, Available at SSRN: https://ssrn.com/abstract=1504438

Kagan Tumer (Contact Author)

Oregon State University ( email )

Bexell Hall 200
Corvallis, OR 97331
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
295
PlumX Metrics