Approaches to Identification of Linear Relations from Compound Noisy and Noise-Free Data
Canonical non-symmetrical correspondence analysis is developed as an alternative method for constrained ordination, relating external information (e.g., environmental variables) with ecological data, considering species abundance as dependant on sites. Ordination axes are restricted to be linear combinations of the environmental variables, based on the information of the most abundant species. This extension and its associated unconstrained ordination method are terms of a global model that permits...
In a spiked population model, the population covariance matrix has all its eigenvalues equal to units except for a few fixed eigenvalues (spikes). This model is proposed by Johnstone to cope with empirical findings on various data sets. The question is to quantify the effect of the perturbation caused by the spike eigenvalues. A recent work by Baik and Silverstein establishes the almost sure limits of the extreme sample eigenvalues associated to the spike eigenvalues when the population and the...
L'une des limites de l'analyse des correspondances multiples appliquée à de grands tableaux de données qualitatives est la difficulté d'analyse et d'interprétation des structures de relations entre variables. Afin de dépasser la frontière descriptive, il est proposé une méthodologie de recherche de schémas d'implication reposant sur les fréquences conditionnelles données par les tableaux de Burt. L'analyse des correspondances multiples y est utilisée comme filtre principal de variables à partir...
Although a nonlinear discrimination function may be superior to linear or quadratic classifiers, it is difficult to construct such a function. In this paper, we propose a method to construct a nonlinear discrimination function using Legendre polynomials. The selection of an optimal set of Legendre polynomials is determined by the MDL (Minimum Description Length) criterion. Results using many real data show the effectiveness of this method.
Correspondence analysis followed by clustering of both rows and columns of a data matrix is proposed as an approach to two-way clustering. The novelty of this contribution consists of: i) proposing a simple method for the selecting of the number of axes; ii) visualizing the data matrix as is done in micro-array analysis; iii) enhancing this representation by emphasizing those variables and those individuals which are 'well represented' in the subspace of the chosen axes. The approach is applied...
Statistical analysis of compositional data, multivariate observations carrying only relative information (proportions, percentages), should be performed only in orthonormal coordinates with respect to the Aitchison geometry on the simplex. In case of three-part compositions it is possible to decompose the covariance structure of the well-known principal components using variances of log-ratios of the original parts. They seem to be helpful for the interpretation of these special orthonormal coordinates....
Dans deux articles, dont voici le premier, sont présentés deux exemples d'analyse statistique par des méthodes factorielles. Le cadre mathématique de l'exposé est algébrique. La présente formulation de ces problèmes s'appuie sur l'expérience d'enseignement menée à l'UER de Mathématiques, Logique Formelle et Informatique de l'Université René-Descartes, ainsi que sur une rédaction parue dans les actes du Colloque «Analyse des données en architecture et urbanisme» [5].
Ce texte constitue la suite de l'article «Deux méthodes linéaires en statistique multidimensionnelle» paru dans le n° 44 de cette revue. Nous nous intéressons ici aux tableaux d'effectifs. La théorie du paragraphe 1.2 est appliquée pour obtenir les résultats : détermination des composantes et axes principaux, construction des graphiques, indices, analyses conjointes des deux nuages associés au tableau des données. On insiste sur quelques difficultés courantes de l'interprétation des résultats. Plusieurs...
El sesgo condicionado se ha propuesto como diagnóstico de influencia en distintos modelos y técnicas estadísticas. Tratando de recoger una visión global de la utilidad del concepto, en este trabajo se hace una revisión general del mismo relacionándolo con la curva de sensibilidad y la curva de influencia muestral. Además, se señalan posibles líneas de trabajo que permitirán abordar el análisis de la influencia a través de este enfoque en una gran variedad de técnicas estadísticas.
In this paper the research of the true number of latent factors in exploratoty factor analysis model is studied through a comparison between the log likelihood ratio test statistics, the information criteria of Akaike, Schwarz and Hannah-Quinn and a procedure of cross-validation. In a simulation study the a priori knowledge of the exact factor structure is used to evaluate the goodness of the different methods.
The general theory of factorial analysis of continuous correspondance (FACC) is used to investigate the binary case of a continuous probability measure defined as:T(x,y) = ayn + b, (x,y) ∈ D & n ∈ N = 0, elsewhereWhere n ≥ 0, a and b are the parameters of this distribution, while the domain D is a variable trapezoidal inscribed in the unit square. The trapezoid depends on two parameters α and β.This problem is solved. As special cases of our problem we obtain a complete solution for...