We offer the quantitative estimation of stability of risk-sensitive cost optimization in the problem of optimal stopping of Markov chain on a Borel space . It is supposed that the transition probability , is approximated by the transition probability , , and that the stopping rule , which is optimal for the process with the transition probability is applied to the process with the transition probability . We give an upper bound (expressed in term of the total variation distance: for...
This work concerns controlled Markov chains with finite state space and nonnegative rewards; it is assumed that the controller has a constant risk-sensitivity, and that the performance ofa control policy is measured by a risk-sensitive expected total-reward criterion. The existence of optimal stationary policies isstudied within this context, and the main resultestablishes the optimalityof a stationary policy achieving the supremum in the correspondingoptimality equation, whenever the associated...
We extend previous results of the same authors ([11]) on the effects of perturbation in the transition probability of a Markov cost chain for discounted Markov control processes. Supposing valid, for each stationary policy, conditions of Lyapunov and Harris type, we get upper bounds for the index of perturbations, defined as the difference of the total expected discounted costs for the original Markov control process and the perturbed one. We present examples that satisfy our conditions.
This work concerns Markov decision chains with finite state and action sets. The transition law satisfies the simultaneous Doeblin condition but is unknown to the controller, and the problem of determining an optimal adaptive policy with respect to the average reward criterion is addressed. A subset of policies is identified so that, when the system evolves under a policy in that class, the frequency estimators of the transition law are consistent on an essential set of admissible state-action pairs,...
We analyse a Markov chain and perturbations of the transition probability and the one-step cost function (possibly unbounded) defined on it. Under certain conditions, of Lyapunov and Harris type, we obtain new estimates of the effects of such perturbations via an index of perturbations, defined as the difference of the total expected discounted costs between the original Markov chain and the perturbed one. We provide an example which illustrates our analysis.
This work concerns Markov decision processes with finite state space and compact action sets. The decision maker is supposed to have a constant-risk sensitivity coefficient, and a control policy is graded via the risk-sensitive expected total-reward criterion associated with nonnegative one-step rewards. Assuming that the optimal value function is finite, under mild continuity and compactness restrictions the following result is established: If the number of ergodic classes when a stationary policy...
This paper deals with Markov Control Processes (MCPs) on Euclidean spaces with an infinite horizon and a discounted total cost. Firstly, MCPs which result from the deterministic controlled systems will be analyzed. For such MCPs, conditions that permit to establish the equation known in the literature of Economy as Euler’s Equation (EE) will be given. There will be also presented an example of a Markov Control Process with deterministic controlled system where, to obtain the optimal value function,...
Firstly, in this paper there is considered a certain class of possibly unbounded optimization problems on Euclidean spaces, for which conditions that permit to obtain monotone minimizers are given. Secondly, the theory developed in the first part of the paper is applied to Markov control processes (MCPs) on real spaces with possibly unbounded cost function, and with possibly noncompact control sets, considering both the discounted and the average cost as optimality criterion. In the context described,...
This paper deals with Markov decision processes (MDPs) with real state space for which its minimum is attained, and that are upper bounded by (uncontrolled) stochastically ordered (SO) Markov chains. We consider MDPs with (possibly) unbounded costs, and to evaluate the quality of each policy, we use the objective function known as the average cost. For this objective function we consider two Markov control models and . and have the same components except for the transition laws. The transition...
In this paper conditions proposed in Flores-Hernández and Montes-de-Oca [3] which permit to obtain monotone minimizers of unbounded optimization problems on Euclidean spaces are adapted in suitable versions to study noncooperative games on Euclidean spaces with noncompact sets of feasible joint strategies in order to obtain increasing optimal best responses for each player. Moreover, in this noncompact framework an algorithm to approximate the equilibrium points for noncooperative games is supplied....
In a Discounted Markov Decision Process (DMDP) with finite action sets the Value Iteration Algorithm, under suitable conditions, leads to an optimal policy in a finite number of steps. Determining an upper bound on the necessary number of steps till gaining convergence is an issue of great theoretical and practical interest as it would provide a computationally feasible stopping rule for value iteration as an algorithm for finding an optimal policy. In this paper we find such a bound depending only...
In this paper a problem of consumption and investment is presented as a model of a discounted Markov decision process with discrete-time. In this problem, it is assumed that the wealth is affected by a production function. This assumption gives the investor a chance to increase his wealth before the investment. For the solution of the problem there is established a suitable version of the Euler Equation (EE) which characterizes its optimal policy completely, that is, there are provided conditions...
The authors introduce risk sensitivity to a model of sequential games where players don't know beforehand which of them will make a choice at each stage of the game. It is shown that every sequential game without a predetermined order of turns with risk sensitivity has a Nash equilibrium, as well as in the case in which players have types that are chosen for them before the game starts and that are kept from the other players. There are also a couple of examples that show how the equilibria might...
This paper deals with a certain class of unbounded optimization problems. The optimization problems taken into account depend on a parameter. Firstly, there are established conditions which permit to guarantee the continuity with respect to the parameter of the minimum of the optimization problems under consideration, and the upper semicontinuity of the multifunction which applies each parameter into its set of minimizers. Besides, with the additional condition of uniqueness of the minimizer, its...
The paper concerns Markov decision processes (MDPs) with both the state and the decision spaces being finite and with the total reward as the objective function. For such a kind of MDPs, the authors assume that the reward function is of a fuzzy type. Specifically, this fuzzy reward function is of a suitable trapezoidal shape which is a function of a standard non-fuzzy reward. The fuzzy control problem consists of determining a control policy that maximizes the fuzzy expected total reward, where...
The main objective of this paper is to find structural conditions under which a stochastic game between two players with total reward functions has an -equilibrium. To reach this goal, the results of Markov decision processes are used to find -optimal strategies for each player and then the correspondence of a better answer as well as a more general version of Kakutani’s Fixed Point Theorem to obtain the -equilibrium mentioned. Moreover, two examples to illustrate the theory developed are presented....
Many examples in optimization, ranging from Linear Programming to Markov Decision Processes (MDPs), present more than one optimal solution. The study of this non-uniqueness is of great mathematical interest. In this paper the authors show that in a specific family of discounted MDPs, non-uniqueness is a “fragile” property through Ekeland's Principle for each problem with at least two optimal policies; a perturbed model is produced with a unique optimal policy. This result not only supersedes previous...
The article presents an extension of the theory of standard Markov decision processes on discrete spaces and with the average cost as the objective function which permits to take into account a fuzzy average cost of a trapezoidal type. In this context, the fuzzy optimal control problem is considered with respect to two cases: the max-order of the fuzzy numbers and the average ranking order of the trapezoidal fuzzy numbers. Each of these cases extends the standard optimal control problem, and for...
Download Results (CSV)