Displaying similar documents to “Automatic error localisation for categorical, continuous and integer data.”

Detecting a data set structure through the use of nonlinear projections search and optimization

Victor L. Brailovsky, Michael Har-Even (1998)

Kybernetika

Similarity:

Detecting a cluster structure is considered. This means solving either the problem of discovering a natural decomposition of data points into groups (clusters) or the problem of detecting clouds of data points of a specific form. In this paper both these problems are considered. To discover a cluster structure of a specific arrangement or a cloud of data of a specific form a class of nonlinear projections is introduced. Fitness functions that estimate to what extent a given subset of...

Linear discriminant analysis with a generalization of the Moore-Penrose pseudoinverse

Tomasz Górecki, Maciej Łuczak (2013)

International Journal of Applied Mathematics and Computer Science

Similarity:

The Linear Discriminant Analysis (LDA) technique is an important and well-developed area of classification, and to date many linear (and also nonlinear) discrimination methods have been put forward. A complication in applying LDA to real data occurs when the number of features exceeds that of observations. In this case, the covariance estimates do not have full rank, and thus cannot be inverted. There are a number of ways to deal with this problem. In this paper, we propose improving...

Experiments with two Approaches for Tracking Drifting Concepts

Koychev, Ivan (2007)

Serdica Journal of Computing

Similarity:

This paper addresses the task of learning classifiers from streams of labelled data. In this case we can face the problem that the underlying concepts can change over time. The paper studies two mechanisms developed for dealing with changing concepts. Both are based on the time window idea. The first one forgets gradually, by assigning to the examples weight that gradually decreases over time. The second one uses a statistical test to detect changes in concept and then optimizes the...

An algorithm for reducing the dimension and size of a sample for data exploration procedures

Piotr Kulczycki, Szymon Łukasik (2014)

International Journal of Applied Mathematics and Computer Science

Similarity:

The paper deals with the issue of reducing the dimension and size of a data set (random sample) for exploratory data analysis procedures. The concept of the algorithm investigated here is based on linear transformation to a space of a smaller dimension, while retaining as much as possible the same distances between particular elements. Elements of the transformation matrix are computed using the metaheuristics of parallel fast simulated annealing. Moreover, elimination of or a decrease...

Application of agent-based simulated annealing and tabu search procedures to solving the data reduction problem

Ireneusz Czarnowski, Piotr Jędrzejowicz (2011)

International Journal of Applied Mathematics and Computer Science

Similarity:

The problem considered concerns data reduction for machine learning. Data reduction aims at deciding which features and instances from the training set should be retained for further use during the learning process. Data reduction results in increased capabilities and generalization properties of the learning model and a shorter time of the learning process. It can also help in scaling up to large data sources. The paper proposes an agent-based data reduction approach with the learning...

A Comparative Analysis of Predictive Learning Algorithms on High-Dimensional Microarray Cancer Data

Bill, Jo, Fokoue, Ernest (2014)

Serdica Journal of Computing

Similarity:

This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent...