Displaying similar documents to “Optimization of the maximum likelihood estimator for determining the intrinsic dimensionality of high-dimensional data”

Survival analysis on data streams: Analyzing temporal events in dynamically changing environments

Ammar Shaker, Eyke Hüllermeier (2014)

International Journal of Applied Mathematics and Computer Science

Similarity:

In this paper, we introduce a method for survival analysis on data streams. Survival analysis (also known as event history analysis) is an established statistical method for the study of temporal “events” or, more specifically, questions regarding the temporal distribution of the occurrence of events and their dependence on covariates of the data sources. To make this method applicable in the setting of data streams, we propose an adaptive variant of a model that is closely related to...

A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining

Fokoue, Ernest (2014)

Serdica Journal of Computing

Similarity:

Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to be analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of...

A complete gradient clustering algorithm formed with kernel estimators

Piotr Kulczycki, Małgorzata Charytanowicz (2010)

International Journal of Applied Mathematics and Computer Science

Similarity:

The aim of this paper is to provide a gradient clustering algorithm in its complete form, suitable for direct use without requiring a deeper statistical knowledge. The values of all parameters are effectively calculated using optimizing procedures. Moreover, an illustrative analysis of the meaning of particular parameters is shown, followed by the effects resulting from possible modifications with respect to their primarily assigned optimal values. The proposed algorithm does not demand...

Survival analysis with coarsely observed covariates.

Soren Feodor Nielsen (2003)

SORT

Similarity:

In this paper we consider analysis of survival data with incomplete covariate information. We model the incomplete covariates as a random coarsening of the complete covariate, and an overview of the theory of coarsening at random is given. Various ways of estimating the parameters of the model for the survival data given the covariates are discussed and compared.

Protecting micro-data by micro-aggregation: the experience in Eurostat.

Daniel Defays (1997)

Qüestiió

Similarity:

A natural strategy to protect the confidentiality of individual data is to aggregate them at the lowest possible level. Some studies realised in Eurostat on this topic will be presented: properties of classifications in clusters of fixed sizes, micro-aggregation as a generic method to protect the confidentiality of individual data, application to the Community Innovation Survey. The work performed in Eurostat will be put in line with other projects conducted at European level on the...

Detecting a data set structure through the use of nonlinear projections search and optimization

Victor L. Brailovsky, Michael Har-Even (1998)

Kybernetika

Similarity:

Detecting a cluster structure is considered. This means solving either the problem of discovering a natural decomposition of data points into groups (clusters) or the problem of detecting clouds of data points of a specific form. In this paper both these problems are considered. To discover a cluster structure of a specific arrangement or a cloud of data of a specific form a class of nonlinear projections is introduced. Fitness functions that estimate to what extent a given subset of...

Ridge estimation of covariance matrix from data in two classes

Yi Zhou, Bin Zhang (2024)

Applications of Mathematics

Similarity:

This paper deals with the problem of estimating a covariance matrix from the data in two classes: (1) good data with the covariance matrix of interest and (2) contamination coming from a Gaussian distribution with a different covariance matrix. The ridge penalty is introduced to address the problem of high-dimensional challenges in estimating the covariance matrix from the two-class data model. A ridge estimator of the covariance matrix has a uniform expression and keeps positive-definite,...

An algorithm for reducing the dimension and size of a sample for data exploration procedures

Piotr Kulczycki, Szymon Łukasik (2014)

International Journal of Applied Mathematics and Computer Science

Similarity:

The paper deals with the issue of reducing the dimension and size of a data set (random sample) for exploratory data analysis procedures. The concept of the algorithm investigated here is based on linear transformation to a space of a smaller dimension, while retaining as much as possible the same distances between particular elements. Elements of the transformation matrix are computed using the metaheuristics of parallel fast simulated annealing. Moreover, elimination of or a decrease...

Data mining techniques using decision tree model in materialised projection and selection view.

Y. W. Teh (2004)

Mathware and Soft Computing

Similarity:

With the availability of very large data storage today, redundant data structures are no longer a big issue. However, an intelligent way of managing materialised projection and selection views that can lead to fast access of data is the central issue dealt with in this paper. A set of implementation steps for the data warehouse administrators or decision makers to improve the response time of queries is also defined. The study concludes that both attributes and tuples, are important...