Bottom-up learning of hierarchical models in a class of deterministic POMDP environments

Hideaki Itoh; Hisao Fukumoto; Hiroshi Wakuya; Tatsuya Furukawa

Bottom-up learning of hierarchical models in a class of deterministic POMDP environments

Hideaki Itoh; Hisao Fukumoto; Hiroshi Wakuya; Tatsuya Furukawa

International Journal of Applied Mathematics and Computer Science (2015)

Volume: 25, Issue: 3, page 597-615
ISSN: 1641-876X

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

The theory of partially observable Markov decision processes (POMDPs) is a useful tool for developing various intelligent agents, and learning hierarchical POMDP models is one of the key approaches for building such agents when the environments of the agents are unknown and large. To learn hierarchical models, bottom-up learning methods in which learning takes place in a layer-by-layer manner from the lowest to the highest layer are already extensively used in some research fields such as hidden Markov models and neural networks. However, little attention has been paid to bottom-up approaches for learning POMDP models. In this paper, we present a novel bottom-up learning algorithm for hierarchical POMDP models and prove that, by using this algorithm, a perfect model (i.e., a model that can perfectly predict future observations) can be learned at least in a class of deterministic POMDP environments.

How to cite

top

MLA
BibTeX
RIS

Hideaki Itoh, et al. "Bottom-up learning of hierarchical models in a class of deterministic POMDP environments." International Journal of Applied Mathematics and Computer Science 25.3 (2015): 597-615. <http://eudml.org/doc/271792>.

@article{HideakiItoh2015,
abstract = {The theory of partially observable Markov decision processes (POMDPs) is a useful tool for developing various intelligent agents, and learning hierarchical POMDP models is one of the key approaches for building such agents when the environments of the agents are unknown and large. To learn hierarchical models, bottom-up learning methods in which learning takes place in a layer-by-layer manner from the lowest to the highest layer are already extensively used in some research fields such as hidden Markov models and neural networks. However, little attention has been paid to bottom-up approaches for learning POMDP models. In this paper, we present a novel bottom-up learning algorithm for hierarchical POMDP models and prove that, by using this algorithm, a perfect model (i.e., a model that can perfectly predict future observations) can be learned at least in a class of deterministic POMDP environments.},
author = {Hideaki Itoh, Hisao Fukumoto, Hiroshi Wakuya, Tatsuya Furukawa},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {partially observable Markov decision processes; hierarchical models; bottom-up learning},
language = {eng},
number = {3},
pages = {597-615},
title = {Bottom-up learning of hierarchical models in a class of deterministic POMDP environments},
url = {http://eudml.org/doc/271792},
volume = {25},
year = {2015},
}

TY - JOUR
AU - Hideaki Itoh
AU - Hisao Fukumoto
AU - Hiroshi Wakuya
AU - Tatsuya Furukawa
TI - Bottom-up learning of hierarchical models in a class of deterministic POMDP environments
JO - International Journal of Applied Mathematics and Computer Science
PY - 2015
VL - 25
IS - 3
SP - 597
EP - 615
AB - The theory of partially observable Markov decision processes (POMDPs) is a useful tool for developing various intelligent agents, and learning hierarchical POMDP models is one of the key approaches for building such agents when the environments of the agents are unknown and large. To learn hierarchical models, bottom-up learning methods in which learning takes place in a layer-by-layer manner from the lowest to the highest layer are already extensively used in some research fields such as hidden Markov models and neural networks. However, little attention has been paid to bottom-up approaches for learning POMDP models. In this paper, we present a novel bottom-up learning algorithm for hierarchical POMDP models and prove that, by using this algorithm, a perfect model (i.e., a model that can perfectly predict future observations) can be learned at least in a class of deterministic POMDP environments.
LA - eng
KW - partially observable Markov decision processes; hierarchical models; bottom-up learning
UR - http://eudml.org/doc/271792
ER -

References

top

Åström, K.J. (1965). Optimal control of Markov decision processes with incomplete state estimation, Journal of Mathematical Analysis and Applications 10(1): 174-205. Zbl0137.35803
Barto, A.G. and Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning, Discrete Event Dynamic Systems 13(4): 341-379. Zbl1034.93003
Bonet, B. (2009). Deterministic POMDPs revisited, Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI), Montreal, Canada, pp. 59-66.
Bui, H.H., Phung, D.Q. and Venkatesh, S. (2004). Hierarchical hidden Markov models with general state hierarchy, Proceedings of the 19th National Conference on Artificial Intelligence (AAAI), San Jose, CA, USA, pp. 324-329.
Chang, H.S., Fard, P.J., Marcus, S.I. and Shayman, M. (2003). Multi-time scale Markov decision processes, IEEE Transactions on Automatic Control 48(6): 976-987.
Charlin, L., Poupart, P. and Shioda, R. (2007). Automated hierarchy discovery for planning in partially observable environments, in B. Schölkopf, J.C. Platt and T. Hofmann (Eds.), Advances in Neural Information Processing Systems 19 (NIPS 2006), The MIT Press, Cambridge, MA, pp. 225-232.
Chatzis, S.P. and Kosmopoulos, D. (2014). A partially-observable Markov decision process for dealing with dynamically changing environments, in L. Iliadis, I. Maglogiannis and H. Papadopoulos (Eds.), Artificial Intelligence Applications and Innovations, Springer-Verlag, Berlin, pp. 111-120.
Dean, T., Angluin, D., Basye, K., Engelson, S., Kaelbling, L., Kokkevis, E. and Maron, O. (1995). Inferring finite automata with stochastic output functions and an application to map learning, Machine Learning 18(1): 81-108.
Dietterich, T.G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition, Journal of Artificial Intelligence Research 13: 227-303. Zbl0963.68085
Doshi-Velez, F. (2009). The infinite partially observable Markov decision process, in Y. Bengio et al. (Eds.), Advances in Neural Information Processing Systems 22 (NIPS 2009), Curran Associates Inc., Red Hook, NY, pp. 477-485.
Doshi-Velez, F., Pfau, D., Wood, F. and Roy, N. (2015). Bayesian nonparametric methods for partially-observable reinforcement learning, IEEE Transactions on Pattern Analysis and Machine Intelligence 37(2): 394-407.
Drake, A. (1962). Observation of a Markov Process Through a Noisy Channel, Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA.
Fine, S., Singer, Y. and Tishby, N. (1998). The hierarchical hidden Markov model: Analysis and applications, Machine Learning 32(1): 41-62. Zbl0901.68178
Foka, A. and Trahanias, P. (2007). Real-time hierarchical POMDPs for autonomous robot navigation, Robotics and Autonomous Systems 55(7): 561-571.
Gavaldà, R., Keller, P.W., Pineau, J. and Precup, D. (2006). PAC-learning of Markov models with hidden state, Proceedings of the 17th European Conference on Machine Learning (ECML), Berlin, Germany, pp. 150-161.
Heller, K.A., Teh, Y.W. and Görür, D. (2009). Infinite hierarchical hidden Markov models, Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS), Clearwater Beach, FL, USA, pp. 224-231.
Hengst, B. (2011). Hierarchical approaches, in M. Wiering and M. van Otterlo (Eds.), Reinforcement Learning: State of the Art, Springer-Verlag, Berlin, pp. 293-323.
Hinton, G.E., Osindero, S. and Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets, Neural Computation 18(7): 1527-1554. Zbl1106.68094
Hoey, J., Poupart, P., Bertoldi, A., Craig, T., Boutilier, C. and Mihailidis, A. (2010). Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process, Computer Vision and Image Understanding 114(5): 503-519.
Holmes, M.P. and Isbell Jr., C.L. (2006). Looping suffix tree-based inference of partially observable hidden state, Proceedings of the 23rd International Conference on Machine Learning (ICML), Pittsburgh, PA, USA, pp. 409-416.
Kaelbling, L.P., Littman, M.L. and Cassandra, A.R. (1999). Planning and acting in partially observable stochastic domains, Artificial Intelligence 101(1-2): 99-134. Zbl0908.68165
Kolobov, A. (2012). Planning with Markov decision processes: An AI perspective, Synthesis Lectures on Artificial Intelligence and Machine Learning 6(1): 1-210. Zbl1270.68014
Kołodziej, J., Khan, S.U., Wang, L., Min-Allah, N., Madani, S.A., Ghani, N. and Li, H. (2011). An application of Markov jump process model for activity-based indoor mobility prediction in wireless networks, 9th IEEE International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan, pp. 51-56.
Li, H., Zhao, Q. and Yang, Z. (2007). Reliability modeling of fault tolerant control systems, International Journal of Applied Mathematics and Computer Science 17(4): 491-504, DOI: 10.2478/v10006-007-0041-0. Zbl1228.90031
Lim, Z., Sun, L. and Hsu, D.J. (2011). Monte Carlo value iteration with macro-actions, in J. Shawe-Taylor et al. (Eds.), Advances in Neural Information Processing Systems 24 (NIPS 2011), Curran Associates Inc., Red Hook, NY, pp. 1287-1295.
Littman, M.L. (1996). Algorithms for Sequential Decision Making, Ph.D. thesis, Brown University, Providence, RI.
Mahadevan, S. (1998). Partially observable semi-Markov decision processes: Theory and applications in engineering and cognitive science, AAAI Fall Symposium on Planning with Partially Observable Markov Decision Processes, Orlando, FL, USA, pp. 113-120.
Mihalkova, L. and Mooney, R.J. (2007). Bottom-up learning of Markov logic network structure, Proceedings of the 24th International Conference on Machine Learning (ICML), Corvallis, OR, USA, pp. 625-632.
Murphy, K.P. (2002). Representing hierarchical POMDPs as DBNs, with applications to mobile robot navigation, www.cs.ubc.ca/˜murphyk/mypapers.html.
Oliver, N., Garg, A. and Horvitz, E. (2004). Layered representations for learning and inferring office activity from multiple sensory channels, Computer Vision and Image Understanding 96(2): 163-180.
Oniszczuk, W. (2009). Semi-Markov-based approach for the analysis of open tandem networks with blocking and truncation, International Journal of Applied Mathematics and Computer Science 19(1): 151-163, DOI: 10.2478/v10006-009-0014-6. Zbl1167.90459
Pineau, J., Montemerlo, M., Pollack, M., Roy, N. and Thrun, S. (2003). Towards robotic assistants in nursing homes: Challenges and results, Robotics and Autonomous Systems 42(3): 271-281. Zbl1011.68806
Poupart, P. and Vlassis, N. (2008). Model-based Bayesian reinforcement learning in partially observable domains, Proceedings of the 10th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Fort Lauderdale, FL, USA, p. 8.
Rao, V. and Teh, Y. W. (2013). Fast MCMC sampling for Markov jump processes and extensions, The Journal of Machine Learning Research 14(1): 3295-3320. Zbl1318.60078
Ross, S., Pineau, J., Chaib-draa, B. and Kreitmann, P. (2011). A Bayesian approach for learning and planning in partially observable Markov decision processes, Journal of Machine Learning Research 12: 1729-1770. Zbl1280.68193
Roy, N., Pineau, J. and Thrun, S. (2000). Spoken dialogue management using probabilistic reasoning, Proceedings of the 38th Annual Meeting on Association for Computational Linguistics (ACL), Hong Kong, China, pp. 93-100.
Rusek, K., Janowski, L. and Papir, Z. (2014). Transient and stationary characteristics of a packet buffer modelled as an MAP/SM/1/b system, International Journal of Applied Mathematics and Computer Science 24(2): 429-442, DOI: 10.2478/amcs-2014-0033. Zbl1293.60070
Sallans, B. (2000). Learning factored representations for partially observable Markov decision processes, in S.A. Solla, T.K. Leen and K. Müller (Eds.), Advances in Neural Information Processing Systems 12 (NIPS 1999), The MIT Press, Cambridge, MA, pp. 1050-1056.
Shani, G., Brafman, R.I. and Shimony, S.E. (2005). Model-based online learning of POMDPs, Proceedings of the 16th European Conference on Machine Learning (ECML), Porto, Portugal, pp. 353-364.
Spaan, M.T.J. and Vlassis, N. (2005). Perseus: Randomized point-based value iteration for POMDPs, Journal of Artificial Intelligence Research 24: 195-220. Zbl1080.68674
Theocharous, G. (2002). Hierarchical Learning and Planning in Partially Observable Markov Decision Processes, Ph.D. thesis, Michigan State University, East Lansing, MI.
Theocharous, G. and Mahadevan, S. (2002). Approximate planning with hierarchical partially observable Markov decision process models for robot navigation, Proceedings of the 2002 IEEE International Conference on Robotics and Automation (ICRA), Washington, DC, USA, pp. 1347-1352.
Theocharous, G., Murphy, K. and Kaelbling, L.P. (2004). Representing hierarchical POMDPs as DBNs for multi-scale robot localization, Proceedings of the 2004 IEEE International Conference on Robotics and Automation (ICRA), New Orleans, LA, USA, Vol. 1, pp. 1045-1051.
Toussaint, M., Charlin, L. and Poupart, P. (2008). Hierarchical POMDP controller optimization by likelihood maximization, Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI), Helsinki, Finland, pp. 562-570.
Wakabayashi, K. and Miura, T. (2012). Forward-backward activation algorithm for hierarchical hidden Markov models, in F. Pereira et al. (Eds.), Advances in Neural Information Processing Systems 25 (NIPS 2012), Curran Associates Inc., Red Hook, NY, pp. 1502-1510.
White, C.C. (1976). Procedures for the solution of a finite-horizon, partially observed, semi-Markov optimization problem, Operations Research 24(2): 348-358. Zbl0344.90038
Young, S., Gašić, M., Thomson, B. and Williams, J.D. (2013). POMDP-based statistical spoken dialog systems: A review, Proceedings of the IEEE 101(5): 1160-1179.
Youngblood, G.M. and Cook, D.J. (2007). Data mining for hierarchical model creation, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 37(4): 561-572.
Youngblood, G.M., Heierman, E.O., Cook, D.J. and Holder, L.B. (2005). Automated HPOMDP construction through data-mining techniques in the intelligent environment domain, Proceedings of the 18th International Florida Artificial Intelligence Research Society Conference (FLAIRS), Clearwater Beach, FL, USA, pp. 194-199.
Zamani, Z., Sanner, S., Poupart, P. and Kersting, K. (2012). Symbolic dynamic programming for continuous state and observation POMDPs, in F. Pereira et al. (Eds.), Advances in Neural Information Processing Systems 25 (NIPS 2012), Curran Associates Inc., Red Hook, NY, pp. 1403-1411.

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.