Influence of modeling structure in probabilistic sequential decision problems

Florent Teichteil-Königsbuch; Patrick Fabiani

RAIRO - Operations Research (2006)

  • Volume: 40, Issue: 2, page 195-234
  • ISSN: 0399-0559

Abstract

top
Markov Decision Processes (MDPs) are a classical framework for stochastic sequential decision problems, based on an enumerated state space representation. More compact and structured representations have been proposed: factorization techniques use state variables representations, while decomposition techniques are based on a partition of the state space into sub-regions and take advantage of the resulting structure of the state transition graph. We use a family of probabilistic exploration-like planning problems in order to study the influence of the modeling structure on the MDP solution. We first discuss the advantages and drawbacks of a graph based representation of the state space, then present our comparisons of two decomposition techniques, and propose to use a global approach combining both state space factorization and decomposition techniques. On the exploration problem instance, it is proposed to take advantage of the natural topological structure of the navigation space, which is partitioned into regions. A number of local policies are optimized within each region, that become the macro-actions of the global abstract MDP resulting from the decomposition. The regions are the corresponding macro-states in the abstract MDP. The global abstract MDP is obtained in a factored form, combining all the initial MDP state variables and one macro-state “region” variable standing for the different possible macro-states corresponding to the regions. Further research is presently conducted on efficient solution algorithms implementing the same hybrid approach for tackling large size MDPs.

How to cite

top

Teichteil-Königsbuch, Florent, and Fabiani, Patrick. "Influence of modeling structure in probabilistic sequential decision problems ." RAIRO - Operations Research 40.2 (2006): 195-234. <http://eudml.org/doc/249768>.

@article{Teichteil2006,
abstract = { Markov Decision Processes (MDPs) are a classical framework for stochastic sequential decision problems, based on an enumerated state space representation. More compact and structured representations have been proposed: factorization techniques use state variables representations, while decomposition techniques are based on a partition of the state space into sub-regions and take advantage of the resulting structure of the state transition graph. We use a family of probabilistic exploration-like planning problems in order to study the influence of the modeling structure on the MDP solution. We first discuss the advantages and drawbacks of a graph based representation of the state space, then present our comparisons of two decomposition techniques, and propose to use a global approach combining both state space factorization and decomposition techniques. On the exploration problem instance, it is proposed to take advantage of the natural topological structure of the navigation space, which is partitioned into regions. A number of local policies are optimized within each region, that become the macro-actions of the global abstract MDP resulting from the decomposition. The regions are the corresponding macro-states in the abstract MDP. The global abstract MDP is obtained in a factored form, combining all the initial MDP state variables and one macro-state “region” variable standing for the different possible macro-states corresponding to the regions. Further research is presently conducted on efficient solution algorithms implementing the same hybrid approach for tackling large size MDPs. },
author = {Teichteil-Königsbuch, Florent, Fabiani, Patrick},
journal = {RAIRO - Operations Research},
keywords = {Probabilistic planning; dynamic programming; Markov decision processes; application to autonomous decision making. ; probabilistic planning; Markov Decision Processes},
language = {eng},
month = {10},
number = {2},
pages = {195-234},
publisher = {EDP Sciences},
title = {Influence of modeling structure in probabilistic sequential decision problems },
url = {http://eudml.org/doc/249768},
volume = {40},
year = {2006},
}

TY - JOUR
AU - Teichteil-Königsbuch, Florent
AU - Fabiani, Patrick
TI - Influence of modeling structure in probabilistic sequential decision problems
JO - RAIRO - Operations Research
DA - 2006/10//
PB - EDP Sciences
VL - 40
IS - 2
SP - 195
EP - 234
AB - Markov Decision Processes (MDPs) are a classical framework for stochastic sequential decision problems, based on an enumerated state space representation. More compact and structured representations have been proposed: factorization techniques use state variables representations, while decomposition techniques are based on a partition of the state space into sub-regions and take advantage of the resulting structure of the state transition graph. We use a family of probabilistic exploration-like planning problems in order to study the influence of the modeling structure on the MDP solution. We first discuss the advantages and drawbacks of a graph based representation of the state space, then present our comparisons of two decomposition techniques, and propose to use a global approach combining both state space factorization and decomposition techniques. On the exploration problem instance, it is proposed to take advantage of the natural topological structure of the navigation space, which is partitioned into regions. A number of local policies are optimized within each region, that become the macro-actions of the global abstract MDP resulting from the decomposition. The regions are the corresponding macro-states in the abstract MDP. The global abstract MDP is obtained in a factored form, combining all the initial MDP state variables and one macro-state “region” variable standing for the different possible macro-states corresponding to the regions. Further research is presently conducted on efficient solution algorithms implementing the same hybrid approach for tackling large size MDPs.
LA - eng
KW - Probabilistic planning; dynamic programming; Markov decision processes; application to autonomous decision making. ; probabilistic planning; Markov Decision Processes
UR - http://eudml.org/doc/249768
ER -

References

top
  1. D. Aberdeen, S. Thiébaux and L. Zhang, Decision-theoretic military operations planning, in Proceedings of 14th Conf. ICAPS 2004, Whistler, Canada (2004) 402–412.  
  2. R. Bellman, Dynamic Programming. Princeton University Press, Princeton, NJ (1957).  
  3. D. Bertsekas and J. Tsitsiklis, Neuro-dynamic programming: an overview (1995).  
  4. B. Bonet and H. Geffner, Labeled rtdp: Improving the convergence of real-time dynamic programming, in Proceedings of 13th Conf. ICAPS 2003, Trento, Italy (2003) 12–21.  
  5. C. Boutilier and D. Poole, Computing optimal policies for partially observable decision processes using compact representations. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, USA, AAAI Press / The MIT Press (1996) 1168–1175.  
  6. C. Boutilier, Correlated action effects in decision theoretic regression, in Uncertainty in Artificial Intelligence (1997) 30–37.  
  7. C. Boutilier, R.I. Brafman and C. Geib, Structured reachability analysis for Markov decision processes, in Uncertainty in Artificial Intelligence (1998) 24–32.  
  8. C. Boutilier, T. Dean and S. Hanks, Decision-theoretic planning: Structural assumptions and computational leverage. J. Artificial Intell. Res.11 (1999) 1–94.  Zbl0918.68110
  9. C. Boutilier, R. Dearden and M. Goldszmidt, Stochastic dynamic programming with factored representations. Artificial Intell.121 (2000) 49–107.  Zbl0948.68167
  10. A.R. Cassandra, Exact and Approximate Algorithms for Partially Observable Markov Decision Processes. Computer science, U. of Illinois, Providence R.I. (1998).  
  11. T. Dean and K. Kanazawa, A model for reasoning about persistence and causation. Computational Intelligence5 (1989) 142–150.  
  12. T. Dean and S.-H. Lin, Decomposition techniques for planning in stochastic domains, in Proceedings of the 14th IJCAI 1995, San Francisco, CA (1995) 1121–1129.  
  13. R. Dearden and C. Boutilier, Abstraction and approximate decision-theoretic planning. Artificial Intell.89 (1997) 219–283.  Zbl1042.68669
  14. T.G. Dietterich, Hierarchical reinforcement learning using the maxq value function decomposition. J. Artificial Intell. Res.13 (2000) 227–303.  Zbl0963.68085
  15. I.S. Duff, A survey of sparse matrix research, in Proceedings of the IEEE, Prentice Hall, New York 65 (1977) 500–535.  
  16. I.S. Duff, A.M. Erisman and J.K. Reid, Direct Methods for Sparse Matrices. Clarendon Press, Oxford (1986).  Zbl0604.65011
  17. A. Dutech, Solving pomdp's using selected past events, in Proceedings of the 14th ECAI 2000, Berlin, Germany (July 2000) 281–285.  
  18. P. Fabiani and F. Teichteil-Königsbuch, Symbolic heuristic policy iteration algorithms for structured decision-theoretic exploration problems, in Workshop on Reasoning under Uncertainty in Robotics - RUR'2005, Edinburgh, Scotland (2005).  Zbl1112.90092
  19. Z. Feng and E. Hansen, Symbolic heuristic search for factored markov decision processes, in Proceedings of 18th Conf. AAAI 2002, Edmonton, Alberta, Canada (2002) 455–460.  
  20. Z. Feng, E.A. Hansen and S. Zilberstein, Symbolic generalization for on-line planning, in Proceedings of 19th Conf. UAI 2003, Acapulco, Mexico (2003) 209–216.  
  21. C. Guestrin, M. Hauskrecht and B. Kveton, Solving factored mdps with continuous and discrete variables, in Proceedings of 20th Conf. UAI 2004, Banff, Canada (2004).  Zbl1182.68252
  22. C. Guestrin, D. Koller, R. Parr and S. Venkataraman, Efficient solution algorithms for factored mdps. J. Artificial Intell. Res.19 (2003) 399–468.  Zbl1026.68125
  23. E.A. Hansen and S. Zilberstein, Lao*: A heuristic search algorithm that finds solutions with loops. Artificial Intell.129 (2001) 35–62.  Zbl0971.68036
  24. M. Hauskrecht, N. Meuleau, L.P. Kaelbling, T.L. Dean and C. Boutilier, Hierarchical solution of markov decision processes using macro-actions, in Proceedings of 14th Conf. UAI 1998, San Francisco, CA (1998) 220–229.  
  25. J. Hoey, R. St-Aubin, A. Hu and C. Boutilier, Spudd: Stochastic planning using decision diagrams, in Proceedings of 15th Conf. UAI 1999, San Francisco, CA (1999) 279–288.  
  26. J. Hoey, R. St-Aubin, A. Hu and C. Boutilier, Optimal and approximate stochastic planning using decision diagrams. Technical Report TR-2000-05, University of British Columbia, 10 (2000).  
  27. K.-E. Kim and T. Dean, Solving factored mdps using non-homogeneous partitions. Artificial Intell.147 (2003) 225–251.  Zbl1082.68804
  28. B. Kveton and M. Hauskrecht, Heuristic refinements of approximate linear programming for factored continuous-state markov decision processes, in Proceedings of 14th Conf. ICAPS 2004, Whistler, Canada (2004) 306–314.  
  29. S.M. Lavalle, A Game-Theoretic Framework for Robot Motion Planning. Electrical engineering, University of Illinois, Urbana-Champaign (1995).  
  30. W.S. Lovejoy, A survey of algorithmic methods for partially observed markov decision processes. Technical Report 28, Annals of Operation Research (1991).  Zbl0717.90086
  31. R. Parr, Flexible decomposition algorithms for weakly coupled markov decision problems, in Proceedings of 14th Conf. UAI 1998, San Francisco, CA (1998) 422–430.  
  32. J. Pineau, G. Gordon and S. Thrun, Policy-contingent abstraction for robust robot control, in Conference on Uncertainty in Articifical Intelligence (UAI) (2003) 477–484.  
  33. M.L. Puterman, Markov Decision Processes. John Wiley & Sons, INC (1994).  Zbl0829.90134
  34. R.I. Bahar, E.A. Frohm, C.M. Gaona, G.D. Hachtel, E. Macii, A. Pardo and F. Somenzi, Algebraic Decision Diagrams and Their Applications, in IEEE /ACM International Conference on CAD, Santa Clara, California, 1993. IEEE Computer Society Press 188–191.  
  35. Y. Saad, Iterative Methods for Sparse Linear Systems. Society of Industrial and Applied Mathematics, second edition (2003).  Zbl1031.65046
  36. R. Sabbadin, Graph partitioning techniques for markov decision processes decomposition, in Proceedings of the 15th ECAI 2002, Lyon, France (July 2002) 670–674.  
  37. R. St-Aubin, J. Hoey and C. Boutilier, APRICODD: Approximate policy construction using decision diagrams, in NIPS (2000) 1089–1095.  
  38. R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998).  
  39. F. Teichteil-Königsbuch and P. Fabiani, Processus Décisionnel de Markov décomposé et factorisé pour l'optimisation d'une stratégie d'exploration. Revue d'Intelligence Artificielle20 (2006) 133-179.  
  40. F. Teichteil-Königsbuch and P. Fabiani, Symbolic heuristic policy iteration algorithms for structured decision-theoretic exploration problems, in Workshop on Planning under Uncertainty for Autonomous Systems ICAPS'2005, Monterey, CA, USA (2005).  Zbl1112.90092
  41. G. Verfaillie, F. Garcia and L. Péret, Deployment and Maintenance of a Constellation of Satellites: a Benchmark, in Proceedings of ICAPS'03 Workshop on Planning under Uncertainty and Incomplete Information, Trento, Italy (June 2003).  

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.