Finite mixture modelling of class-conditional distributions is a standard method in a statistical pattern recognition. This paper, using bag-of-words vector document representation, explores the use of the mixture of multinomial distributions as a model for class-conditional distribution for multiclass text document classification task. Experimental comparison of the proposed model and the standard Bernoulli and multinomial models as well as the model based on mixture of multivariate Bernoulli distributions...
The paper gives an overview of feature selection techniques in statistical pattern recognition with particular emphasis on methods developed within the Institute of Information Theory and Automation research team throughout recent years. Besides discussing the advances in methodology since times of Perez’s pioneering work the paper attempts to put the methods into a taxonomical framework. The methods discussed include the latest variants of the optimal algorithms, enhanced sub-optimal techniques...
The paper briefly reviews recent advances in the methodology of feature selection (FS) and the conceptual base of a consulting system for solving FS problems. The reasons for designing a kind of expert or consulting system which would guide a less experienced user are outlined. The paper also attempts to provide a guideline which approach to choose with respect to the extent of a priori knowledge of the problem. The methods discussed here form the core of the software package being developed for...
The purpose of feature selection in machine learning is at least two-fold - saving measurement acquisition costs and reducing the negative effects of the curse of dimensionality with the aim to improve the accuracy of the models and the classification rate of classifiers with respect to previously unknown data. Yet it has been shown recently that the process of feature selection itself can be negatively affected by the very same curse of dimensionality - feature selection methods may easily over-fit...
Download Results (CSV)