Optimal estimators in learning theory

V. N. Temlyakov

Banach Center Publications (2006)

  • Volume: 72, Issue: 1, page 341-366
  • ISSN: 0137-6934

Abstract

top
This paper is a survey of recent results on some problems of supervised learning in the setting formulated by Cucker and Smale. Supervised learning, or learning-from-examples, refers to a process that builds on the base of available data of inputs x i and outputs y i , i = 1,...,m, a function that best represents the relation between the inputs x ∈ X and the corresponding outputs y ∈ Y. The goal is to find an estimator f z on the base of given data z : = ( ( x , y ) , . . . , ( x m , y m ) ) that approximates well the regression function f ρ of an unknown Borel probability measure ρ defined on Z = X × Y. We assume that ( x i , y i ) , i = 1,...,m, are indepent and distributed according to ρ. We discuss a problem of finding optimal (in the sense of order) estimators for different classes Θ (we assume f ρ Θ ). It is known from the previous works that the behavior of the entropy numbers ϵₙ(Θ,B) of Θ in a Banach space B plays an important role in the above problem. The standard way of measuring the error between a target function f ρ and an estimator f z is to use the L ( ρ X ) norm ( ρ X is the marginal probability measure on X generated by ρ). The usual way in regression theory to evaluate the performance of the estimator f z is by studying its convergence in expectation, i.e. the rate of decay of the quantity E ( | | f ρ - f z | | ² L ( ρ X ) ) as the sample size m increases. Here the expectation is taken with respect to the product measure ρ m defined on Z m . A more accurate and more delicate way of evaluating the performance of f z has been pushed forward in [CS]. In [CS] the authors study the probability distribution function ρ m z : | | f ρ - f z | | L ( ρ X ) η instead of the expectation E ( | | f ρ - f z | | ² L ( ρ X ) ) . In this survey we mainly discuss the optimization problem formulated in terms of the probability distribution function.

How to cite

top

V. N. Temlyakov. "Optimal estimators in learning theory." Banach Center Publications 72.1 (2006): 341-366. <http://eudml.org/doc/282344>.

@article{V2006,
abstract = {This paper is a survey of recent results on some problems of supervised learning in the setting formulated by Cucker and Smale. Supervised learning, or learning-from-examples, refers to a process that builds on the base of available data of inputs $x_i$ and outputs $y_i$, i = 1,...,m, a function that best represents the relation between the inputs x ∈ X and the corresponding outputs y ∈ Y. The goal is to find an estimator $f_\{z\}$ on the base of given data $z: = ((x₁,y₁),...,(x_m,y_m))$ that approximates well the regression function $f_ρ$ of an unknown Borel probability measure ρ defined on Z = X × Y. We assume that $(x_i,y_i)$, i = 1,...,m, are indepent and distributed according to ρ. We discuss a problem of finding optimal (in the sense of order) estimators for different classes Θ (we assume $f_ρ ∈ Θ$). It is known from the previous works that the behavior of the entropy numbers ϵₙ(Θ,B) of Θ in a Banach space B plays an important role in the above problem. The standard way of measuring the error between a target function $f_ρ$ and an estimator $f_\{z\}$ is to use the $L₂(ρ_X)$ norm ($ρ_X$ is the marginal probability measure on X generated by ρ). The usual way in regression theory to evaluate the performance of the estimator $f_\{z\}$ is by studying its convergence in expectation, i.e. the rate of decay of the quantity $E(||f_\{ρ\} - f_\{z\}||²_\{L₂(ρ_X)\})$ as the sample size m increases. Here the expectation is taken with respect to the product measure $ρ^m$ defined on $Z^m$. A more accurate and more delicate way of evaluating the performance of $f_\{z\}$ has been pushed forward in [CS]. In [CS] the authors study the probability distribution function $ρ^m\{z: ||f_\{ρ\} - f_\{z\}||_\{L₂(ρ_X)\} ≥ η\}$ instead of the expectation $E(||f_\{ρ\} - f_\{z\}||²_\{L₂(ρ_X)\})$. In this survey we mainly discuss the optimization problem formulated in terms of the probability distribution function.},
author = {V. N. Temlyakov},
journal = {Banach Center Publications},
language = {eng},
number = {1},
pages = {341-366},
title = {Optimal estimators in learning theory},
url = {http://eudml.org/doc/282344},
volume = {72},
year = {2006},
}

TY - JOUR
AU - V. N. Temlyakov
TI - Optimal estimators in learning theory
JO - Banach Center Publications
PY - 2006
VL - 72
IS - 1
SP - 341
EP - 366
AB - This paper is a survey of recent results on some problems of supervised learning in the setting formulated by Cucker and Smale. Supervised learning, or learning-from-examples, refers to a process that builds on the base of available data of inputs $x_i$ and outputs $y_i$, i = 1,...,m, a function that best represents the relation between the inputs x ∈ X and the corresponding outputs y ∈ Y. The goal is to find an estimator $f_{z}$ on the base of given data $z: = ((x₁,y₁),...,(x_m,y_m))$ that approximates well the regression function $f_ρ$ of an unknown Borel probability measure ρ defined on Z = X × Y. We assume that $(x_i,y_i)$, i = 1,...,m, are indepent and distributed according to ρ. We discuss a problem of finding optimal (in the sense of order) estimators for different classes Θ (we assume $f_ρ ∈ Θ$). It is known from the previous works that the behavior of the entropy numbers ϵₙ(Θ,B) of Θ in a Banach space B plays an important role in the above problem. The standard way of measuring the error between a target function $f_ρ$ and an estimator $f_{z}$ is to use the $L₂(ρ_X)$ norm ($ρ_X$ is the marginal probability measure on X generated by ρ). The usual way in regression theory to evaluate the performance of the estimator $f_{z}$ is by studying its convergence in expectation, i.e. the rate of decay of the quantity $E(||f_{ρ} - f_{z}||²_{L₂(ρ_X)})$ as the sample size m increases. Here the expectation is taken with respect to the product measure $ρ^m$ defined on $Z^m$. A more accurate and more delicate way of evaluating the performance of $f_{z}$ has been pushed forward in [CS]. In [CS] the authors study the probability distribution function $ρ^m{z: ||f_{ρ} - f_{z}||_{L₂(ρ_X)} ≥ η}$ instead of the expectation $E(||f_{ρ} - f_{z}||²_{L₂(ρ_X)})$. In this survey we mainly discuss the optimization problem formulated in terms of the probability distribution function.
LA - eng
UR - http://eudml.org/doc/282344
ER -

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.