Displaying 41 – 60 of 132

Showing per page

Economic assessment of the Champagne wine qualitative stock mecanism

Jacques Laye, Maximilien Laye (2006)

RAIRO - Operations Research

In the wine AOC system, the regulation of quantities performed by the professional organizations is aimed to smooth the variations of the quality of the wine due to the variations in the climate that affect the quality of the grapes. Nevertheless, this regulation could be damaging to the consumers due to the price increase resulting from the reduction of the quantities sold on the market. We propose a stochastic control model and a simulation tool able to measure the effects of this mechanism...

Estimates for perturbations of average Markov decision processes with a minimal state and upper bounded by stochastically ordered Markov chains

Raúl Montes-de-Oca, Francisco Salem-Silva (2005)

Kybernetika

This paper deals with Markov decision processes (MDPs) with real state space for which its minimum is attained, and that are upper bounded by (uncontrolled) stochastically ordered (SO) Markov chains. We consider MDPs with (possibly) unbounded costs, and to evaluate the quality of each policy, we use the objective function known as the average cost. For this objective function we consider two Markov control models and 1 . and 1 have the same components except for the transition laws. The transition...

Estimates for perturbations of general discounted Markov control chains

Raúl Montes-de-Oca, Alexander Sakhanenko, Francisco Salem-Silva (2003)

Applicationes Mathematicae

We extend previous results of the same authors ([11]) on the effects of perturbation in the transition probability of a Markov cost chain for discounted Markov control processes. Supposing valid, for each stationary policy, conditions of Lyapunov and Harris type, we get upper bounds for the index of perturbations, defined as the difference of the total expected discounted costs for the original Markov control process and the perturbed one. We present examples that satisfy our conditions.

Estimation and control in finite Markov decision processes with the average reward criterion

Rolando Cavazos-Cadena, Raúl Montes-de-Oca (2004)

Applicationes Mathematicae

This work concerns Markov decision chains with finite state and action sets. The transition law satisfies the simultaneous Doeblin condition but is unknown to the controller, and the problem of determining an optimal adaptive policy with respect to the average reward criterion is addressed. A subset of policies is identified so that, when the system evolves under a policy in that class, the frequency estimators of the transition law are consistent on an essential set of admissible state-action pairs,...

First passage risk probability optimality for continuous time Markov decision processes

Haifeng Huo, Xian Wen (2019)

Kybernetika

In this paper, we study continuous time Markov decision processes (CTMDPs) with a denumerable state space, a Borel action space, unbounded transition rates and nonnegative reward function. The optimality criterion to be considered is the first passage risk probability criterion. To ensure the non-explosion of the state processes, we first introduce a so-called drift condition, which is weaker than the well known regular condition for semi-Markov decision processes (SMDPs). Furthermore, under some...

G-Réseaux dans un environnement aléatoire

Jean-Michel Fourneau, Dominique Verchère (2010)

RAIRO - Operations Research

We study networks with positive and negative customers (or Generalized networks of queues and signals) in a random environment. This environment may change the arrival rates, the routing probabilities, the service rates and also the effect of signals. We prove that the steady-state distribution has a product form. This property is obtained as a corollary of a much more general result on multidimensional Markov chains.

Growth rates and average optimality in risk-sensitive Markov decision chains

Karel Sladký (2008)

Kybernetika

In this note we focus attention on characterizations of policies maximizing growth rate of expected utility, along with average of the associated certainty equivalent, in risk-sensitive Markov decision chains with finite state and action spaces. In contrast to the existing literature the problem is handled by methods of stochastic dynamic programming on condition that the transition probabilities are replaced by general nonnegative matrices. Using the block-triangular decomposition of a collection...

Identification of optimal policies in Markov decision processes

Karel Sladký (2010)

Kybernetika

In this note we focus attention on identifying optimal policies and on elimination suboptimal policies minimizing optimality criteria in discrete-time Markov decision processes with finite state space and compact action set. We present unified approach to value iteration algorithms that enables to generate lower and upper bounds on optimal values, as well as on the current policy. Using the modified value iterations it is possible to eliminate suboptimal actions and to identify an optimal policy...

Influence of modeling structure in probabilistic sequential decision problems

Florent Teichteil-Königsbuch, Patrick Fabiani (2006)

RAIRO - Operations Research

Markov Decision Processes (MDPs) are a classical framework for stochastic sequential decision problems, based on an enumerated state space representation. More compact and structured representations have been proposed: factorization techniques use state variables representations, while decomposition techniques are based on a partition of the state space into sub-regions and take advantage of the resulting structure of the state transition graph. We use a family of probabilistic exploration-like...

Limiting average cost control problems in a class of discrete-time stochastic systems

Nadine Hilgert, Onesimo Hernández-Lerma (2001)

Applicationes Mathematicae

We consider a class of d -valued stochastic control systems, with possibly unbounded costs. The systems evolve according to a discrete-time equation x t + 1 = G ( x t , a t ) + ξ t (t = 0,1,... ), for each fixed n = 0,1,..., where the ξ t are i.i.d. random vectors, and the Gₙ are given functions converging pointwise to some function G as n → ∞. Under suitable hypotheses, our main results state the existence of stationary control policies that are expected average cost (EAC) optimal and sample path average cost (SPAC) optimal for...

Currently displaying 41 – 60 of 132