Text document classification based on mixture models
Jana Novovičová; Antonín Malík
Kybernetika (2004)
- Volume: 40, Issue: 3, page [293]-304
- ISSN: 0023-5954
Access Full Article
topAbstract
topHow to cite
topNovovičová, Jana, and Malík, Antonín. "Text document classification based on mixture models." Kybernetika 40.3 (2004): [293]-304. <http://eudml.org/doc/33701>.
@article{Novovičová2004,
abstract = {Finite mixture modelling of class-conditional distributions is a standard method in a statistical pattern recognition. This paper, using bag-of-words vector document representation, explores the use of the mixture of multinomial distributions as a model for class-conditional distribution for multiclass text document classification task. Experimental comparison of the proposed model and the standard Bernoulli and multinomial models as well as the model based on mixture of multivariate Bernoulli distributions was performed using Reuters-21578 and Newsgroups data sets. Preliminary experimental results indicate the effectiveness of the proposed model in a text classification problem.},
author = {Novovičová, Jana, Malík, Antonín},
journal = {Kybernetika},
keywords = {text classification; multinomialmixture model; text classification; multinomial mixture model},
language = {eng},
number = {3},
pages = {[293]-304},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Text document classification based on mixture models},
url = {http://eudml.org/doc/33701},
volume = {40},
year = {2004},
}
TY - JOUR
AU - Novovičová, Jana
AU - Malík, Antonín
TI - Text document classification based on mixture models
JO - Kybernetika
PY - 2004
PB - Institute of Information Theory and Automation AS CR
VL - 40
IS - 3
SP - [293]
EP - 304
AB - Finite mixture modelling of class-conditional distributions is a standard method in a statistical pattern recognition. This paper, using bag-of-words vector document representation, explores the use of the mixture of multinomial distributions as a model for class-conditional distribution for multiclass text document classification task. Experimental comparison of the proposed model and the standard Bernoulli and multinomial models as well as the model based on mixture of multivariate Bernoulli distributions was performed using Reuters-21578 and Newsgroups data sets. Preliminary experimental results indicate the effectiveness of the proposed model in a text classification problem.
LA - eng
KW - text classification; multinomialmixture model; text classification; multinomial mixture model
UR - http://eudml.org/doc/33701
ER -
References
top- Battiti R., 10.1109/72.298224, IEEE Trans. Neural Networks 5 (1994), 537–550 (1994) DOI10.1109/72.298224
- Dempster A. P., Laird N. M., Rubin D. B., Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Statist. Soc. Ser. B 39 (1977), 1–38 (1977) Zbl0364.62022MR0501537
- Forman G., An experimental study of feature selection metrics for text categorization, J. Mach. Learning Res. 3 (2003), 1289–1305
- Joachims T., Text categorization with support vector machines: Learning with many relevant features, In: Proc. 10th European Conference on Machine Learning (ECML’98), 1998, pp. 137–142 (1998)
- Juan A., Vidal E., 10.1016/S0031-3203(01)00242-4, Pattern Recognition 35 (2002), 2705–2710 DOI10.1016/S0031-3203(01)00242-4
- Kwak N., Choi C., Improved mutual information feature selector for neural networks in supervised learning, In: Proc. Internat. Joint Conference on Neural Networks (IJCNN ’99), 1999 pp. 1313–1318 (1999)
- McCallum A., Nigam K., A comparison of event models for naive Bayes text classification, In: Proc. AAAI-98 Workshop on Learning for Text Categorization, 1998
- McLachlan G. J., Peel D., Finite Mixture Models, Wiley, New York 2000 Zbl0963.62061MR1789474
- Mladenic D., Grobelnik M., Feature selection for unbalanced class distribution and Naive Bayes, In: Proc. Sixteenth Internat. Conference on Machine Learning, 1999, pp. 258–267 (1999)
- Nigam K., McCallum A., Thrun, S., Mitchell T., 10.1023/A:1007692713085, Mach. Learning 39 (2000), 103–134 Zbl0949.68162DOI10.1023/A:1007692713085
- Novovičová J., Pudil, P., Kittler J., 10.1109/34.481557, IEEE Trans. Pattern Anal. Machine Intell. 18 (1996), 218–223 (1996) DOI10.1109/34.481557
- Novovičová J., Malík A., Text Document Classification Using Finite Mixtures, Research Report No. 2063, Institute of Information Theory and Automation, Prague 2002
- Novovičová J., Malík A., Application of multinomial mixture model to text classification, In: Pattern Recognition and Image Analysis (Lecture Notes in Computer Sciences 2652), Springer–Verlag, Berlin 2003, pp. 646–653
- Novovičová J., Malík, A., Pudil P., Feature selection using improved mutual information for text classification, In: Structural, Syntactic and Statistical Pattern Recognition (Lecture Notes in Computer Science), Springer–Verlag, Berlin 2004 (in press) Zbl1104.68663
- Pudil P., Novovičová, J., Kittler J., 10.1016/0031-3203(94)00009-B, Pattern Recognition 28 (1995), 1389–1398 (1995) DOI10.1016/0031-3203(94)00009-B
- Ueda N., Saito K., Parametric mixture models for multi-labeled text, In: Proc. Neural Information Processing Systems, 2003
- Yang Y., Pedersen J. O., A comparative study on feature selection in text categorization, In: Proc. Internat. Conference on Machine Learning, 1997, pp. 412–420 (1997)
- Yang Y., Liu X., A re-examination of text categorization methods, In: Proc. 22nd Internat. ACM SIGIR Conference on Research and Development in Inform. Retrieval, 1999, pp. 42–49 (1999)
- Yang Y., 10.1023/A:1009982220290, J. Inform. Retrieval 1 (1999), 67–88 (1999) DOI10.1023/A:1009982220290
- Yang Y., Zhang, J., Kisiel B., A scalability analysis of classifier in text categorization, In: Proc. 26th ACM SIGIR Conference on Research and Development in Inform. Retrieval, 2003
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.