Evolutionary prototype selection.
The paper deals with sufficient conditions for the existence of general approximate minimum distance estimator (AMDE) of a probability density function on the real line. It shows that the AMDE always exists when the bounded -divergence, Kolmogorov, Lévy, Cramér, or discrepancy distance is used. Consequently, consistency rate in any bounded -divergence is established for Kolmogorov, Lévy, and discrepancy estimators under the condition that the degree of variations of the corresponding family...
For a sequence of statistical experiments with a finite parameter set the asymptotic behavior of the maximum risk is studied for the problem of classification into disjoint subsets. The exponential rates of the optimal decision rule is determined and expressed in terms of the normalized limit of moment generating functions of likelihood ratios. Necessary and sufficient conditions for the existence of adaptive classification rules in the sense of Rukhin [Ru1] are given. The results are applied to...
In this paper the possibilities are discussed for training statistical pattern recognizers based on a distance representation of the objects instead of a feature representation. Distances or similarities are used between the unknown objects to be classified with a selected subset of the training objects (the support objects). These distances are combined into linear or nonlinear classifiers. In this approach the feature definition problem is replaced by finding good similarity measures. The proposal...
An iterative fuzzy clustering method is proposed to partition a set of multivariate binary observation vectors located at neighboring geographic sites. The method described here applies in a binary setup a recently proposed algorithm, called Neighborhood EM, which seeks a partition that is both well clustered in the feature space and spatially regular [AmbroiseNEM1996]. This approach is derived from the EM algorithm applied to mixture models [Dempster1977], viewed as an alternate optimization method...
Classifiers can be combined to reduce classification errors. We did experiments on a data set consisting of different sets of features of handwritten digits. Different types of classifiers were trained on these feature sets. The performances of these classifiers and combination rules were tested. The best results were acquired with the mean, median and product combination rules. The product was best for combining linear classifiers, the median for -NN classifiers. Training a classifier on all features...
We investigate the role of the initialization for the stability of the қ-means clustering algorithm. As opposed to other papers, we consider the actual қ-means algorithm (also known as Lloyd algorithm). In particular we leverage on the property that this algorithm can get stuck in local optima of the қ-means objective function. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and when they...
We investigate the role of the initialization for the stability of the қ-means clustering algorithm. As opposed to other papers, we consider the actual қ-means algorithm (also known as Lloyd algorithm). In particular we leverage on the property that this algorithm can get stuck in local optima of the қ-means objective function. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations...
We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the outcome of the final clustering result. To this end we study the convergence of cluster quality measures...
The purpose of feature selection in machine learning is at least two-fold - saving measurement acquisition costs and reducing the negative effects of the curse of dimensionality with the aim to improve the accuracy of the models and the classification rate of classifiers with respect to previously unknown data. Yet it has been shown recently that the process of feature selection itself can be negatively affected by the very same curse of dimensionality - feature selection methods may easily over-fit...
The paper gives a new interpretation and a possible optimization of the well-known -means algorithm for searching for a locally optimal partition of the set which consists of disjoint nonempty subsets , . For this purpose, a new divided -means algorithm was constructed as a limit case of the known smoothed -means algorithm. It is shown that the algorithm constructed in this way coincides with the -means algorithm if during the iterative procedure no data points appear in the Voronoi diagram....