Displaying similar documents to “Linear discriminant analysis with a generalization of the Moore-Penrose pseudoinverse”

Intrinsic dimensionality and small sample properties of classifiers

Šarūnas Raudys (1998)

Kybernetika

Similarity:

Small learning-set properties of the Euclidean distance, the Parzen window, the minimum empirical error and the nonlinear single layer perceptron classifiers depend on an “intrinsic dimensionality” of the data, however the Fisher linear discriminant function is sensitive to all dimensions. There is no unique definition of the “intrinsic dimensionality”. The dimensionality of the subspace where the data points are situated is not a sufficient definition of the “intrinsic dimensionality”....

Automatic error localisation for categorical, continuous and integer data.

Ton de Waal (2005)

SORT

Similarity:

Data collected by statistical offices generally contain errors, which have to be corrected before reliable data can be published. This correction process is referred to as statistical data editing. At statistical offices, certain rules, so-called edits, are often used during the editing process to determine whether a record is consistent or not. Inconsistent records are considered to contain errors, while consistent records are considered error-free. In this article we focus on automatic...

The post randomisation method for protecting microdata.

José Gouweleeuw, Peter Kooiman, Leon Willenborg, Peter-Paul De Wolf (1998)

Qüestiió

Similarity:

This paper describes the Post Randomisation Method (PRAM) for disclosure protection of microdata. Applying PRAM means that for each record in the data file according to a specified probability mechanism the score on a number of variables is changed. Since this probability mechanism is known, the characteristics of the latent true data can unbiasedly be estimated from the observed data moments in the perturbed file. PRAM is applied to categorical variables. It is shown that...

An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects

Sergio Arciniegas-Alarcón, Marisol García-Peña, Wojtek Janusz Krzanowski, Carlos Tadeu dos Santos Dias (2014)

Biometrical Letters

Similarity:

A common problem in multi-environment trials arises when some genotypeby- environment combinations are missing. In Arciniegas-Alarcón et al. (2010) we outlined a method of data imputation to estimate the missing values, the computational algorithm for which was a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition (SVD). In the present paper we provide two extensions to this methodology, by including weights chosen by cross-validation...

An algorithm for reducing the dimension and size of a sample for data exploration procedures

Piotr Kulczycki, Szymon Łukasik (2014)

International Journal of Applied Mathematics and Computer Science

Similarity:

The paper deals with the issue of reducing the dimension and size of a data set (random sample) for exploratory data analysis procedures. The concept of the algorithm investigated here is based on linear transformation to a space of a smaller dimension, while retaining as much as possible the same distances between particular elements. Elements of the transformation matrix are computed using the metaheuristics of parallel fast simulated annealing. Moreover, elimination of or a decrease...

Correlation-based feature selection strategy in classification problems

Krzysztof Michalak, Halina Kwaśnicka (2006)

International Journal of Applied Mathematics and Computer Science

Similarity:

In classification problems, the issue of high dimensionality, of data is often considered important. To lower data dimensionality, feature selection methods are often employed. To select a set of features that will span a representation space that is as good as possible for the classification task, one must take into consideration possible interdependencies between the features. As a trade-off between the complexity of the selection process and the quality of the selected feature set,...