Displaying similar documents to “An algorithm for reducing the dimension and size of a sample for data exploration procedures”

Detecting a data set structure through the use of nonlinear projections search and optimization

Victor L. Brailovsky, Michael Har-Even (1998)

Kybernetika

Similarity:

Detecting a cluster structure is considered. This means solving either the problem of discovering a natural decomposition of data points into groups (clusters) or the problem of detecting clouds of data points of a specific form. In this paper both these problems are considered. To discover a cluster structure of a specific arrangement or a cloud of data of a specific form a class of nonlinear projections is introduced. Fitness functions that estimate to what extent a given subset of...

Clustering of Symbolic Data based on Affinity Coefficient: Application to a Real Data Set

Áurea Sousa, Helena Bacelar-Nicolau, Fernando C. Nicolau, Osvaldo Silva (2013)

Biometrical Letters

Similarity:

In this paper, we illustrate an application of Ascendant Hierarchical Cluster Analysis (AHCA) to complex data taken from the literature (interval data), based on the standardized weighted generalized affinity coefficient, by the method of Wald and Wolfowitz. The probabilistic aggregation criteria used belong to a parametric family of methods under the probabilistic approach of AHCA, named VL methodology. Finally, we compare the results achieved using our approach with those obtained...

A Comparative Analysis of Predictive Learning Algorithms on High-Dimensional Microarray Cancer Data

Bill, Jo, Fokoue, Ernest (2014)

Serdica Journal of Computing

Similarity:

This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent...

Application of agent-based simulated annealing and tabu search procedures to solving the data reduction problem

Ireneusz Czarnowski, Piotr Jędrzejowicz (2011)

International Journal of Applied Mathematics and Computer Science

Similarity:

The problem considered concerns data reduction for machine learning. Data reduction aims at deciding which features and instances from the training set should be retained for further use during the learning process. Data reduction results in increased capabilities and generalization properties of the learning model and a shorter time of the learning process. It can also help in scaling up to large data sources. The paper proposes an agent-based data reduction approach with the learning...

Data mining techniques using decision tree model in materialised projection and selection view.

Y. W. Teh (2004)

Mathware and Soft Computing

Similarity:

With the availability of very large data storage today, redundant data structures are no longer a big issue. However, an intelligent way of managing materialised projection and selection views that can lead to fast access of data is the central issue dealt with in this paper. A set of implementation steps for the data warehouse administrators or decision makers to improve the response time of queries is also defined. The study concludes that both attributes and tuples, are important...

An alternative extension of the k-means algorithm for clustering categorical data

Ohn San, Van-Nam Huynh, Yoshiteru Nakamori (2004)

International Journal of Applied Mathematics and Computer Science

Similarity:

Most of the earlier work on clustering has mainly been focused on numerical data whose inherent geometric properties can be exploited to naturally define distance functions between data points. Recently, the problem of clustering categorical data has started drawing interest. However, the computational cost makes most of the previous algorithms unacceptable for clustering very large databases. The -means algorithm is well known for its efficiency in this respect. At the same time, working...

Analysis of correlation based dimension reduction methods

Yong Joon Shin, Cheong Hee Park (2011)

International Journal of Applied Mathematics and Computer Science

Similarity:

Dimension reduction is an important topic in data mining and machine learning. Especially dimension reduction combined with feature fusion is an effective preprocessing step when the data are described by multiple feature sets. Canonical Correlation Analysis (CCA) and Discriminative Canonical Correlation Analysis (DCCA) are feature fusion methods based on correlation. However, they are different in that DCCA is a supervised method utilizing class label information, while CCA is an unsupervised...