Bi-personal stochastic transient Markov games with stopping times and total reward criterion

Martínez-Cortés Victor Manuel

Kybernetika (2021)

  • Issue: 1, page 1-14
  • ISSN: 0023-5954

Abstract

top
The article is devoted to a class of Bi-personal (players 1 and 2), zero-sum Markov games evolving in discrete-time on Transient Markov reward chains. At each decision time the second player can stop the system by paying terminal reward to the first player. If the system is not stopped the first player selects a decision and two things will happen: The Markov chain reaches next state according to the known transition law, and the second player must pay a reward to the first player. The first player (resp. the second player) tries to maximize (resp. minimize) his total expected reward (resp. cost). Observe that if the second player is dummy, the problem is reduced to finding optimal policy of a transient Markov reward chain. Contraction properties of the transient model enable to apply the Banach Fixed Point Theorem and establish the Nash Equilibrium. The obtained results are illustrated on two numerical examples.

How to cite

top

Victor Manuel, Martínez-Cortés. "Bi-personal stochastic transient Markov games with stopping times and total reward criterion." Kybernetika (2021): 1-14. <http://eudml.org/doc/297447>.

@article{VictorManuel2021,
abstract = {The article is devoted to a class of Bi-personal (players 1 and 2), zero-sum Markov games evolving in discrete-time on Transient Markov reward chains. At each decision time the second player can stop the system by paying terminal reward to the first player. If the system is not stopped the first player selects a decision and two things will happen: The Markov chain reaches next state according to the known transition law, and the second player must pay a reward to the first player. The first player (resp. the second player) tries to maximize (resp. minimize) his total expected reward (resp. cost). Observe that if the second player is dummy, the problem is reduced to finding optimal policy of a transient Markov reward chain. Contraction properties of the transient model enable to apply the Banach Fixed Point Theorem and establish the Nash Equilibrium. The obtained results are illustrated on two numerical examples.},
author = {Victor Manuel, Martínez-Cortés},
journal = {Kybernetika},
keywords = {two-person Markov games; stopping times; stopping times in transient Markov decision chains; transient and communicating Markov chains},
language = {eng},
number = {1},
pages = {1-14},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Bi-personal stochastic transient Markov games with stopping times and total reward criterion},
url = {http://eudml.org/doc/297447},
year = {2021},
}

TY - JOUR
AU - Victor Manuel, Martínez-Cortés
TI - Bi-personal stochastic transient Markov games with stopping times and total reward criterion
JO - Kybernetika
PY - 2021
PB - Institute of Information Theory and Automation AS CR
IS - 1
SP - 1
EP - 14
AB - The article is devoted to a class of Bi-personal (players 1 and 2), zero-sum Markov games evolving in discrete-time on Transient Markov reward chains. At each decision time the second player can stop the system by paying terminal reward to the first player. If the system is not stopped the first player selects a decision and two things will happen: The Markov chain reaches next state according to the known transition law, and the second player must pay a reward to the first player. The first player (resp. the second player) tries to maximize (resp. minimize) his total expected reward (resp. cost). Observe that if the second player is dummy, the problem is reduced to finding optimal policy of a transient Markov reward chain. Contraction properties of the transient model enable to apply the Banach Fixed Point Theorem and establish the Nash Equilibrium. The obtained results are illustrated on two numerical examples.
LA - eng
KW - two-person Markov games; stopping times; stopping times in transient Markov decision chains; transient and communicating Markov chains
UR - http://eudml.org/doc/297447
ER -

References

top
  1. Ash, E., Real Analysis and Probability., Academic Press, 1972. MR0435320
  2. Cavazos-Cadena, R., Hernández-Hernández, D., Nash equilibria in a class of Markov stopping games., Kybernetika 48 (2012), 1027-1044. MR3086867
  3. Cavazos-Cadena, R., Montes-de-Oca, R., , Math. Methods Oper. Res. 27 (2000), 137-167. MR1782381DOI
  4. Filar, J. A., Vrieze, O. J., , Springer Verlag, Berlin 1996. MR1418636DOI
  5. Granas, A., Dugundji, J., Fixed Point Theory., Springer-Verlag, New York 2003. MR1987179
  6. Hinderer, K., , Springer-Verlag, Berlin 1970. MR0267890DOI
  7. Howard, R. A., Matheson, J., , Management Sci. 23 (1972), 356-369. MR0292497DOI
  8. Kolokoltsov, V. N., Malafayev, O. A., , World Scientific, Singapore 2010. MR2666863DOI
  9. Nash, J., , Proc. National Acad. Sci. United States of America 36 (1950), 48-49. MR0031701DOI
  10. Puterman, M. L., , Wiley, New York 1994. MR1270015DOI
  11. Raghavan, T. E. S., Tijs, S. H., J., O., Vrieze, , J. Optim. Theory Appl. 47 (1985), 451-464. MR0818872DOI
  12. Ross, S., Introduction to Probability Models. Ninth edition., Elsevier 2007. MR1247962
  13. Shapley, L. S., , Proc. National Academy Sciences of United States of America 39 (1953), 1095-1100. Zbl1180.91042MR0061807DOI
  14. Shiryaev, A., Optimal Stopping Rules., Springer, New York 1978. Zbl1138.60008MR0468067
  15. Sladký, K., Martínez-Cortés, V. M., Risk-sensitive optimality in Markov games., In: Proc. 35th International Conference Mathematical Methods in Economics 2017 (P. Pražák, ed.). Univ. Hradec Králové 2017, pp. 684-689. 
  16. Thomas, L. C., , J. Math. Anal. Appl. 68 (1979), 548-556. MR0533512DOI
  17. Thomas, L. C., Connectedness conditions for denumerable state Markov decision processes., In: Recent Developments in Markov Decision Processes (R. Hartley, L.|,C. Thomas and D. J. White, eds.), Academic Press, New York 1980, pp. 181-204. MR0611528
  18. Thuijsman, F., Optimality and Equilibria in Stochastic Games., Mathematical Centre Tracts, Amsterdam 1992. MR1171220
  19. Wal, J. Van der, , Int. J. Game Theory 6 (1977), 11-22. MR0456797DOI
  20. Wal, J. Van der, Stochastic Dynamic Programming., Mathematical Centre Tracts, Amsterdam 1981. MR0633156
  21. Vrieze, O. J., Stochastic Games with Finite State and Action Spaces., Mathematical Centre Tracts, Amsterdam 1987. MR0886482
  22. Zachrisson, L., , In: Advances in Game Theory (M. Dresher, L. S. Shapley and A. W. Tucker, eds.), Princeston University Press 1964. Zbl0126.36507MR0170729DOI
  23. Zijm, W. H. M., Nonnegative Matrices in Dynamic Programming., Mathematisch Centrum, Amsterdam 1983. MR0723868

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.