# Influence of modeling structure in probabilistic sequential decision problems

Florent Teichteil-Königsbuch; Patrick Fabiani

RAIRO - Operations Research (2006)

- Volume: 40, Issue: 2, page 195-234
- ISSN: 0399-0559

## Access Full Article

top## Abstract

top## How to cite

topTeichteil-Königsbuch, Florent, and Fabiani, Patrick. "Influence of modeling structure in probabilistic sequential decision problems ." RAIRO - Operations Research 40.2 (2006): 195-234. <http://eudml.org/doc/249768>.

@article{Teichteil2006,

abstract = {
Markov Decision Processes (MDPs) are a classical framework for
stochastic sequential decision problems, based on an enumerated state
space representation. More compact and structured representations have
been proposed: factorization techniques use state variables
representations, while decomposition techniques are based on a
partition of the state space into sub-regions and take advantage of
the resulting structure of the state transition graph. We use a family
of probabilistic exploration-like planning problems in order to study
the influence of the modeling structure on the MDP solution. We first
discuss the advantages and drawbacks of a graph based representation
of the state space, then present our comparisons of two decomposition
techniques, and propose to use a global approach combining both state
space factorization and decomposition techniques. On the exploration
problem instance, it is proposed to take advantage of the natural
topological structure of the navigation space, which is partitioned
into regions. A number of local policies are optimized within each
region, that become the macro-actions of the global abstract MDP
resulting from the decomposition. The regions are the corresponding
macro-states in the abstract MDP. The global abstract MDP is obtained
in a factored form, combining all the initial MDP state variables and
one macro-state “region” variable standing for the different possible
macro-states corresponding to the regions. Further research is
presently conducted on efficient solution algorithms implementing the
same hybrid approach for tackling large size MDPs.
},

author = {Teichteil-Königsbuch, Florent, Fabiani, Patrick},

journal = {RAIRO - Operations Research},

keywords = {Probabilistic planning; dynamic programming; Markov decision
processes; application to autonomous decision making. ; probabilistic planning; Markov Decision Processes},

language = {eng},

month = {10},

number = {2},

pages = {195-234},

publisher = {EDP Sciences},

title = {Influence of modeling structure in probabilistic sequential decision problems },

url = {http://eudml.org/doc/249768},

volume = {40},

year = {2006},

}

TY - JOUR

AU - Teichteil-Königsbuch, Florent

AU - Fabiani, Patrick

TI - Influence of modeling structure in probabilistic sequential decision problems

JO - RAIRO - Operations Research

DA - 2006/10//

PB - EDP Sciences

VL - 40

IS - 2

SP - 195

EP - 234

AB -
Markov Decision Processes (MDPs) are a classical framework for
stochastic sequential decision problems, based on an enumerated state
space representation. More compact and structured representations have
been proposed: factorization techniques use state variables
representations, while decomposition techniques are based on a
partition of the state space into sub-regions and take advantage of
the resulting structure of the state transition graph. We use a family
of probabilistic exploration-like planning problems in order to study
the influence of the modeling structure on the MDP solution. We first
discuss the advantages and drawbacks of a graph based representation
of the state space, then present our comparisons of two decomposition
techniques, and propose to use a global approach combining both state
space factorization and decomposition techniques. On the exploration
problem instance, it is proposed to take advantage of the natural
topological structure of the navigation space, which is partitioned
into regions. A number of local policies are optimized within each
region, that become the macro-actions of the global abstract MDP
resulting from the decomposition. The regions are the corresponding
macro-states in the abstract MDP. The global abstract MDP is obtained
in a factored form, combining all the initial MDP state variables and
one macro-state “region” variable standing for the different possible
macro-states corresponding to the regions. Further research is
presently conducted on efficient solution algorithms implementing the
same hybrid approach for tackling large size MDPs.

LA - eng

KW - Probabilistic planning; dynamic programming; Markov decision
processes; application to autonomous decision making. ; probabilistic planning; Markov Decision Processes

UR - http://eudml.org/doc/249768

ER -

## References

top- D. Aberdeen, S. Thiébaux and L. Zhang, Decision-theoretic military operations planning, in Proceedings of 14th Conf. ICAPS 2004, Whistler, Canada (2004) 402–412.
- R. Bellman, Dynamic Programming. Princeton University Press, Princeton, NJ (1957).
- D. Bertsekas and J. Tsitsiklis, Neuro-dynamic programming: an overview (1995).
- B. Bonet and H. Geffner, Labeled rtdp: Improving the convergence of real-time dynamic programming, in Proceedings of 13th Conf. ICAPS 2003, Trento, Italy (2003) 12–21.
- C. Boutilier and D. Poole, Computing optimal policies for partially observable decision processes using compact representations. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, USA, AAAI Press / The MIT Press (1996) 1168–1175.
- C. Boutilier, Correlated action effects in decision theoretic regression, in Uncertainty in Artificial Intelligence (1997) 30–37.
- C. Boutilier, R.I. Brafman and C. Geib, Structured reachability analysis for Markov decision processes, in Uncertainty in Artificial Intelligence (1998) 24–32.
- C. Boutilier, T. Dean and S. Hanks, Decision-theoretic planning: Structural assumptions and computational leverage. J. Artificial Intell. Res.11 (1999) 1–94. Zbl0918.68110
- C. Boutilier, R. Dearden and M. Goldszmidt, Stochastic dynamic programming with factored representations. Artificial Intell.121 (2000) 49–107. Zbl0948.68167
- A.R. Cassandra, Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. Computer science, U. of Illinois, Providence R.I. (1998).
- T. Dean and K. Kanazawa, A model for reasoning about persistence and causation. Computational Intelligence5 (1989) 142–150.
- T. Dean and S.-H. Lin, Decomposition techniques for planning in stochastic domains, in Proceedings of the 14th IJCAI 1995, San Francisco, CA (1995) 1121–1129.
- R. Dearden and C. Boutilier, Abstraction and approximate decision-theoretic planning. Artificial Intell.89 (1997) 219–283. Zbl1042.68669
- T.G. Dietterich, Hierarchical reinforcement learning using the maxq value function decomposition. J. Artificial Intell. Res.13 (2000) 227–303. Zbl0963.68085
- I.S. Duff, A survey of sparse matrix research, in Proceedings of the IEEE, Prentice Hall, New York 65 (1977) 500–535.
- I.S. Duff, A.M. Erisman and J.K. Reid, Direct Methods for Sparse Matrices. Clarendon Press, Oxford (1986). Zbl0604.65011
- A. Dutech, Solving pomdp's using selected past events, in Proceedings of the 14th ECAI 2000, Berlin, Germany (July 2000) 281–285.
- P. Fabiani and F. Teichteil-Königsbuch, Symbolic heuristic policy iteration algorithms for structured decision-theoretic exploration problems, in Workshop on Reasoning under Uncertainty in Robotics - RUR'2005, Edinburgh, Scotland (2005). Zbl1112.90092
- Z. Feng and E. Hansen, Symbolic heuristic search for factored markov decision processes, in Proceedings of 18th Conf. AAAI 2002, Edmonton, Alberta, Canada (2002) 455–460.
- Z. Feng, E.A. Hansen and S. Zilberstein, Symbolic generalization for on-line planning, in Proceedings of 19th Conf. UAI 2003, Acapulco, Mexico (2003) 209–216.
- C. Guestrin, M. Hauskrecht and B. Kveton, Solving factored mdps with continuous and discrete variables, in Proceedings of 20th Conf. UAI 2004, Banff, Canada (2004). Zbl1182.68252
- C. Guestrin, D. Koller, R. Parr and S. Venkataraman, Efficient solution algorithms for factored mdps. J. Artificial Intell. Res.19 (2003) 399–468. Zbl1026.68125
- E.A. Hansen and S. Zilberstein, Lao*: A heuristic search algorithm that finds solutions with loops. Artificial Intell.129 (2001) 35–62. Zbl0971.68036
- M. Hauskrecht, N. Meuleau, L.P. Kaelbling, T.L. Dean and C. Boutilier, Hierarchical solution of markov decision processes using macro-actions, in Proceedings of 14th Conf. UAI 1998, San Francisco, CA (1998) 220–229.
- J. Hoey, R. St-Aubin, A. Hu and C. Boutilier, Spudd: Stochastic planning using decision diagrams, in Proceedings of 15th Conf. UAI 1999, San Francisco, CA (1999) 279–288.
- J. Hoey, R. St-Aubin, A. Hu and C. Boutilier, Optimal and approximate stochastic planning using decision diagrams. Technical Report TR-2000-05, University of British Columbia, 10 (2000).
- K.-E. Kim and T. Dean, Solving factored mdps using non-homogeneous partitions. Artificial Intell.147 (2003) 225–251. Zbl1082.68804
- B. Kveton and M. Hauskrecht, Heuristic refinements of approximate linear programming for factored continuous-state markov decision processes, in Proceedings of 14th Conf. ICAPS 2004, Whistler, Canada (2004) 306–314.
- S.M. Lavalle, A Game-Theoretic Framework for Robot Motion Planning. Electrical engineering, University of Illinois, Urbana-Champaign (1995).
- W.S. Lovejoy, A survey of algorithmic methods for partially observed markov decision processes. Technical Report 28, Annals of Operation Research (1991). Zbl0717.90086
- R. Parr, Flexible decomposition algorithms for weakly coupled markov decision problems, in Proceedings of 14th Conf. UAI 1998, San Francisco, CA (1998) 422–430.
- J. Pineau, G. Gordon and S. Thrun, Policy-contingent abstraction for robust robot control, in Conference on Uncertainty in Articifical Intelligence (UAI) (2003) 477–484.
- M.L. Puterman, Markov Decision Processes. John Wiley & Sons, INC (1994). Zbl0829.90134
- R.I. Bahar, E.A. Frohm, C.M. Gaona, G.D. Hachtel, E. Macii, A. Pardo and F. Somenzi, Algebraic Decision Diagrams and Their Applications, in IEEE /ACM International Conference on CAD, Santa Clara, California, 1993. IEEE Computer Society Press 188–191.
- Y. Saad, Iterative Methods for Sparse Linear Systems. Society of Industrial and Applied Mathematics, second edition (2003). Zbl1031.65046
- R. Sabbadin, Graph partitioning techniques for markov decision processes decomposition, in Proceedings of the 15th ECAI 2002, Lyon, France (July 2002) 670–674.
- R. St-Aubin, J. Hoey and C. Boutilier, APRICODD: Approximate policy construction using decision diagrams, in NIPS (2000) 1089–1095.
- R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998).
- F. Teichteil-Königsbuch and P. Fabiani, Processus Décisionnel de Markov décomposé et factorisé pour l'optimisation d'une stratégie d'exploration. Revue d'Intelligence Artificielle20 (2006) 133-179.
- F. Teichteil-Königsbuch and P. Fabiani, Symbolic heuristic policy iteration algorithms for structured decision-theoretic exploration problems, in Workshop on Planning under Uncertainty for Autonomous Systems ICAPS'2005, Monterey, CA, USA (2005). Zbl1112.90092
- G. Verfaillie, F. Garcia and L. Péret, Deployment and Maintenance of a Constellation of Satellites: a Benchmark, in Proceedings of ICAPS'03 Workshop on Planning under Uncertainty and Incomplete Information, Trento, Italy (June 2003).

## NotesEmbed ?

topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.