Learning in games with bounded memory

Jaideep Roy

Displaying similar documents to “Learning in games with bounded memory”

Parrondo's paradox.

Berresford, Geoffrey C., Rockett, Andrew M. (2003)

International Journal of Mathematics and Mathematical Sciences

Similarity:

Equilibria in a class of games and topological results implying their existence.

R.S. Simon, S. Spiez, H. Torunczyk (2008)

RACSAM

Similarity:

We survey results related to the problem of the existence of equilibria in some classes of infinitely repeated two-person games of incomplete information on one side, first considered by Aumann, Maschler and Stearns. We generalize this setting to a broader one of principal-agent problems. We also discuss topological results needed, presenting them dually (using cohomology in place of homology) and more systematically than in our earlier papers.

Risk-sensitive Markov stopping games with an absorbing state

Jaicer López-Rivero, Rolando Cavazos-Cadena, Hugo Cruz-Suárez (2022)

Kybernetika

Similarity:

This work is concerned with discrete-time Markov stopping games with two players. At each decision time player II can stop the game paying a terminal reward to player I, or can let the system to continue its evolution. In this latter case player I applies an action affecting the transitions and entitling him to receive a running reward from player II. It is supposed that player I has a no-null and constant risk-sensitivity coefficient, and that player II tries to minimize the utility...

The so-called Petersburg paradox

Hugo Steinhaus (1949)

Colloquium Mathematicum

Similarity:

A note on “Big-Match”

Jean-Michel Coulomb (1997)

ESAIM: Probability and Statistics

Similarity:

Denumerable Markov stopping games with risk-sensitive total reward criterion

Manuel A. Torres-Gomar, Rolando Cavazos-Cadena, Hugo Cruz-Suárez (2024)

Kybernetika

Similarity:

This paper studies Markov stopping games with two players on a denumerable state space. At each decision time player II has two actions: to stop the game paying a terminal reward to player I, or to let the system to continue it evolution. In this latter case, player I selects an action affecting the transitions and charges a running reward to player II. The performance of each pair of strategies is measured by the risk-sensitive total expected reward of player I. Under mild continuity...

Modeling shortest path games with Petri nets: a Lyapunov based theory

Julio Clempner (2006)

International Journal of Applied Mathematics and Computer Science

Similarity:

In this paper we introduce a new modeling paradigm for shortest path games representation with Petri nets. Whereas previous works have restricted attention to tracking the net using Bellman's equation as a utility function, this work uses a Lyapunov-like function. In this sense, we change the traditional cost function by a trajectory-tracking function which is also an optimal cost-to-target function. This makes a significant difference in the conceptualization of the problem domain,...

Approximations of dynamic Nash games with general state and action spaces and ergodic costs for the players

Tomasz Bielecki (1997)

Applicationes Mathematicae

Similarity:

The purpose of this paper is to prove existence of an ε -equilib- rium point in a dynamic Nash game with Borel state space and long-run time average cost criteria for the players. The idea of the proof is first to convert the initial game with ergodic costs to an ``equivalent" game endowed with discounted costs for some appropriately chosen value of the discount factor, and then to approximate the discounted Nash game obtained in the first step with a countable state space game for which...

Markov stopping games with an absorbing state and total reward criterion

Rolando Cavazos-Cadena, Luis Rodríguez-Gutiérrez, Dulce María Sánchez-Guillermo (2021)

Kybernetika

Similarity:

This work is concerned with discrete-time zero-sum games with Markov transitions on a denumerable space. At each decision time player II can stop the system paying a terminal reward to player I, or can let the system to continue its evolution. If the system is not halted, player I selects an action which affects the transitions and receives a running reward from player II. Assuming the existence of an absorbing state which is accessible from any other state, the performance of a pair...

Evolving small-board Go players using coevolutionary temporal difference learning with archives

Krzysztof Krawiec, Wojciech Jaśkowski, Marcin Szubert (2011)

International Journal of Applied Mathematics and Computer Science

Similarity:

We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented as weighted piece counters. CTDL is a randomized learning technique which interweaves two search processes that operate in the intra-game and inter-game mode. Intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates the board evaluation function according to differences observed between its values for consecutively...

Greatly increased practical usefulness of two-person game theory by adoption of median criterion

John E. Walsh, Grace J. Kelleher (1970)

RAIRO - Operations Research - Recherche Opérationnelle

Similarity: