Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion
We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations with i.i.d. -valued random vectors whose density is unknown. Assuming observability of we propose the procedure of statistical estimation of that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded costs.