Increasing Scalability in Algorithms for Centralized and Decentralized Partially Observable Markov Decision Processes

13 Nov
Thursday, 11/13/2008 10:30am to 12:00pm
Ph.D. Dissertation Proposal Defense

Christopher Amato

Computer Science Building, Room 151

Developing effective frameworks for reasoning under uncertainty is a thriving research area in artificial intelligence. When a single decision maker is present, the partially observable Markov decision process (POMDP) model is a popular and powerful choice. When choices are made in a decentralized manner by a set of decision makers, the problem can be modeled as a decentralized partially observable Markov decision process (DEC-POMDP). While POMDPs and DEC-POMDPs offer a rich framework for sequential decision making under uncertainty, the computational complexity of each model presents an important research challenge.
As a way to address this high complexity, two methods that we have studied include optimizing fixed size solutions and making use of the goal structure that is present in many DEC-POMDPs. Using fixed size solutions can effectively address the intractable memory requirements of current algorithms for both models. While other approaches have used search and optimization, we have described a way to define an optimal policy of a desired size as a nonlinear program (NLP). We have shown that even when this optimal representation is intractable to solve, approximate solutions can outperform current state-of-the-art POMDP and DEC-POMDP algorithms. Our goal based work can provide an optimal solution for DEC-POMDPs under the assumptions that designated actions cause the problem to terminate and negative rewards are given for all other non-terminal actions. If these assumptions are weakened, our approximate sample-based algorithm can be used to produce significantly higher results than previous approaches. We have also derived bounds for this algorithm to determine the number of samples that are needed to ensure optimality is approached.
In the future, we expect to develop a method to improve the performance of the optimal DEC-POMDP algorithm by using policy information to reduce the necessary search space. This method can not only allow larger problems to be solved optimally, but can also increase the scalability of several approximate algorithms that make use of the the optimal approach to produce solutions. Furthermore, natural lower complexity subclasses in which the agents are limited in how they can affect the environment could also be identified with this approach. We also plan to use a set of attributes, or landmarks, to simplify the generation of solution of POMDPs and DEC-POMDPs. These could be automatically determined or user provided and would allow the solution space to be reduced by eliminating impossible or undesirable results. Lastly, to better evaluate these contributions, we will develop more practical real-world domains on which to test algorithms.
Advisor: Shlomo Zilberstein