Data imputation in switchback designs using a mixed model with correlated errors.
The paper presents data mining methods applied to gene selection for recognition of a particular type of prostate cancer on the basis of gene expression arrays. Several chosen methods of gene selection, including the Fisher method, correlation of gene with a class, application of the support vector machine and statistical hypotheses, are compared on the basis of clustering measures. The results of applying these individual selection methods are combined together to identify the most often selected...
In the companion paper [C. Maugis and B. Michel, A non asymptotic penalized criterion for Gaussian mixture model selection. ESAIM: P&S 15 (2011) 41–68] , a penalized likelihood criterion is proposed to select a Gaussian mixture model among a specific model collection. This criterion depends on unknown constants which have to be calibrated in practical situations. A “slope heuristics” method is described and experimented to deal with this practical problem. In a model-based clustering context,...
In the companion paper [C. Maugis and B. Michel, A non asymptotic penalized criterion for Gaussian mixture model selection. ESAIM: P&S15 (2011) 41–68] , a penalized likelihood criterion is proposed to select a Gaussian mixture model among a specific model collection. This criterion depends on unknown constants which have to be calibrated in practical situations. A “slope heuristics” method is described and experimented to deal with this practical problem. In a model-based clustering context, the...
In this paper we comment on some papers written by Jerzy K. Baksalary. In particular, we draw attention to the development process of some specific research ideas and papers now that some time, more than 15 years, has gone after their publication.
Six different functions measuring the defect of a quasi-copula, i. e., how far away it is from a copula, are discussed. This is done by means of extremal non-positive volumes of specific rectangles (in a way that a zero defect characterizes copulas). Based on these defect functions, six transformations of quasi-copulas are investigated which give rise to six different partitions of the set of all quasi-copulas. For each of these partitions, each equivalence class contains exactly one copula being...
2000 Mathematics Subject Classification: 68T01, 62H30, 32C09.Locally Linear Embedding (LLE) has gained prominence as a tool in unsupervised non-linear dimensional reduction. While the algorithm aims to preserve certain proximity relations between the observed points, this may not always be desirable if the shape in higher dimensions that we are trying to capture is observed with noise. This note suggests that a desirable first step is to remove or at least reduce the noise in the observations before...
The following three results for the general multivariate Gauss-Markoff model with a singular covariance matrix are given or indicated. determinant ratios as products of independent chi-square distributions, moments for the determinants and the method of obtaining approximate densities of the determinants.
A conditional variance is an indicator of the level of independence between two random variables. We exploit this intuitive relationship and define a measure v which is almost a measure of mutual complete dependence. Unsurprisingly, the measure attains its minimum value for many pairs of non-independent ran- dom variables. Adjusting the measure so as to make it invariant under all Borel measurable injective trans- formations, we obtain a copula-based measure of dependence v* satisfying A. Rényi’s...
Despite of its many shortcomings, Pearson’s rho is often used as an association measure for stock returns. A conditional version of Spearman’s rho is suggested as an alternative measure of association. This approach is purely nonparametric and avoids any kind of model misspecification. We derive hypothesis tests for the conditional rank-correlation coefficients particularly arising in bull and bear markets and study their finite-sample performance by Monte Carlo simulation. Further, the daily returns...
Detecting a cluster structure is considered. This means solving either the problem of discovering a natural decomposition of data points into groups (clusters) or the problem of detecting clouds of data points of a specific form. In this paper both these problems are considered. To discover a cluster structure of a specific arrangement or a cloud of data of a specific form a class of nonlinear projections is introduced. Fitness functions that estimate to what extent a given subset of data points...
A method of geometrical characterization of multidimensional data sets, including construction of the convex hull of the data and calculation of the volume of the convex hull, is described. This technique, together with the concept of minimum convex hull volume, can be used for detection of influential points or outliers in multiple linear regression. An approximation to the true concept is achieved by ordering the data into a linear sequence such that the volume of the convex hull of the first...
Dans deux articles, dont voici le premier, sont présentés deux exemples d'analyse statistique par des méthodes factorielles. Le cadre mathématique de l'exposé est algébrique. La présente formulation de ces problèmes s'appuie sur l'expérience d'enseignement menée à l'UER de Mathématiques, Logique Formelle et Informatique de l'Université René-Descartes, ainsi que sur une rédaction parue dans les actes du Colloque «Analyse des données en architecture et urbanisme» [5].