### Adaptive control for discrete-time Markov processes with unbounded costs: Discounted criterion

We study the adaptive control problem for discrete-time Markov control processes with Borel state and action spaces and possibly unbounded one-stage costs. The processes are given by recurrent equations ${x}_{t+1}=F({x}_{t},{a}_{t},{\xi}_{t}),\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}t=0,1,...$ with i.i.d. ${\Re}^{k}$-valued random vectors ${\xi}_{t}$ whose density $\rho $ is unknown. Assuming observability of ${\xi}_{t}$ we propose the procedure of statistical estimation of $\rho $ that allows us to prove discounted asymptotic optimality of two types of adaptive policies used early for the processes with bounded costs.