Partially Observable Markov Decision Processes: A Geometric Technique and Analysis

Operations Research 58(1):214-228. 2010

Posted: 4 Apr 2012

See all articles by Hao Zhang

Hao Zhang

UBC Sauder School of Business

Date Written: May 25, 2009

Abstract

This paper presents a novel framework for studying partially observable Markov decision processes (POMDPs) with finite state, action, observation sets and discounted rewards. The new framework is solely based on future-reward vectors associated with future policies, which is more parsimonious than the traditional framework based on belief vectors. It reveals the connection between the POMDP problem and two computational geometry problems, i.e., finding the vertices of a convex hull and finding the Minkowski sum of convex polytopes, which can help solve the POMDP problem more efficiently. The new framework can clarify some existing algorithms over both finite and infinite horizons and shed new light on them. It also facilitates the comparison of POMDPs with respect to their degree of observability, as a useful structural result.

Keywords: dynamic programming, partially observable Markov decision processes, learning, artificial intelligence

Suggested Citation

Zhang, Hao, Partially Observable Markov Decision Processes: A Geometric Technique and Analysis (May 25, 2009). Operations Research 58(1):214-228. 2010, Available at SSRN: https://ssrn.com/abstract=2034041

Hao Zhang (Contact Author)

UBC Sauder School of Business ( email )

2053 Main Mall
Vancouver, BC V6T 1Z2
Canada

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
352
PlumX Metrics