Hierarchical text categorization using fuzzy relational thesaurus

Domonkos Tikk; Jae Dong Yang; Sun Lee Bang

Kybernetika (2003)

  • Volume: 39, Issue: 5, page [583]-600
  • ISSN: 0023-5954

Abstract

top
Text categorization is the classification to assign a text document to an appropriate category in a predefined set of categories. We present a new approach for the text categorization by means of Fuzzy Relational Thesaurus (FRT). FRT is a multilevel category system that stores and maintains adaptive local dictionary for each category. The goal of our approach is twofold; to develop a reliable text categorization method on a certain subject domain, and to expand the initial FRT by automatically added terms, thereby obtaining an incrementally defined knowledge base of the domain. We implemented the categorization algorithm and compared it with some other hierarchical classifiers. Experimental results have been shown that our algorithm outperforms its rivals on all document corpora investigated.

How to cite

top

Tikk, Domonkos, Yang, Jae Dong, and Bang, Sun Lee. "Hierarchical text categorization using fuzzy relational thesaurus." Kybernetika 39.5 (2003): [583]-600. <http://eudml.org/doc/33667>.

@article{Tikk2003,
abstract = {Text categorization is the classification to assign a text document to an appropriate category in a predefined set of categories. We present a new approach for the text categorization by means of Fuzzy Relational Thesaurus (FRT). FRT is a multilevel category system that stores and maintains adaptive local dictionary for each category. The goal of our approach is twofold; to develop a reliable text categorization method on a certain subject domain, and to expand the initial FRT by automatically added terms, thereby obtaining an incrementally defined knowledge base of the domain. We implemented the categorization algorithm and compared it with some other hierarchical classifiers. Experimental results have been shown that our algorithm outperforms its rivals on all document corpora investigated.},
author = {Tikk, Domonkos, Yang, Jae Dong, Bang, Sun Lee},
journal = {Kybernetika},
keywords = {text mining; knowledge base management; multi-level categorization; hierarchical text categorization; text mining; knowledge base management; multi-level categorization; hierarchical text categorization},
language = {eng},
number = {5},
pages = {[583]-600},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Hierarchical text categorization using fuzzy relational thesaurus},
url = {http://eudml.org/doc/33667},
volume = {39},
year = {2003},
}

TY - JOUR
AU - Tikk, Domonkos
AU - Yang, Jae Dong
AU - Bang, Sun Lee
TI - Hierarchical text categorization using fuzzy relational thesaurus
JO - Kybernetika
PY - 2003
PB - Institute of Information Theory and Automation AS CR
VL - 39
IS - 5
SP - [583]
EP - 600
AB - Text categorization is the classification to assign a text document to an appropriate category in a predefined set of categories. We present a new approach for the text categorization by means of Fuzzy Relational Thesaurus (FRT). FRT is a multilevel category system that stores and maintains adaptive local dictionary for each category. The goal of our approach is twofold; to develop a reliable text categorization method on a certain subject domain, and to expand the initial FRT by automatically added terms, thereby obtaining an incrementally defined knowledge base of the domain. We implemented the categorization algorithm and compared it with some other hierarchical classifiers. Experimental results have been shown that our algorithm outperforms its rivals on all document corpora investigated.
LA - eng
KW - text mining; knowledge base management; multi-level categorization; hierarchical text categorization; text mining; knowledge base management; multi-level categorization; hierarchical text categorization
UR - http://eudml.org/doc/33667
ER -

References

top
  1. Aas L., Eikvil L., Text Categorisation: A Survey, Raport NR 941, Norwegian Computing Center, 1999 
  2. Apte C., Damerau F. J., Weiss S. M., 10.1145/183422.183423, ACM Trans. Information Systems 12 (1994), 3, 233–251 (1994) DOI10.1145/183422.183423
  3. Baker K. D., McCallum A. K., Distributional clustering of words for text classification, In: Proc. 21th Annual Internat. ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), Melbourne, Australia 1998, pp. 96–103 (1998) 
  4. Chakrabarti S., Dom B., Agrawal, R., Raghavan P., 10.1007/s007780050061, The VLDB Journal 7 (1998), 3, 163–178 (1998) DOI10.1007/s007780050061
  5. Choi J. H., Park J. J., Yang J. D., Lee, and D. K., An Object-based Approach to Managing Domain Specific Thesauri: Semiautomatic Thesaurus Construction, Query-based Browsing, Technical Report TR 98/11, Dept. of Computer Science, Chonbuk National University, 1998.http://cs.chonbuk.ac.kr/jdyang/publication/techpaper.html (1998) 
  6. Chuang W., Tiyyagura A., Yang, J., Giuffrida G., A fast algorithm for hierarchical text classification, In: Proc. 2nd Internat. Conference on Data Warehousing and Knowledge Discovery (DaWaK’00), London–Greenwich, UK 2000, pp. 409–418 
  7. Dagan I., Karov, Y., Roth D., Mistake-driven learning in text categorization, In: Proc. Second Conference on Empirical Methods in Natural Language Processing (C. Cardie and R. Weischedel, eds.), Association for Computational Linguistics, Somerset, NJ 1997, pp. 55–63 (1997) 
  8. Dumais S. T., 10.3758/BF03203370, Behaviour Research Methods, Instruments and Computers 23 (1991), 2, 229–236 (1991) DOI10.3758/BF03203370
  9. Dumais S. T., Platt J., Heckerman, D., Sahami M., Inductive learning algorithms and representations for text categorization, In: Proc. 7th ACM Internat. Conference on Information and Knowledge Management (CIKM-98), Bethesda, MD 1998, pp. 148-ů155 (1998) 
  10. Fisher D. H., 10.1007/BF00114265, Machine Learning 2 (1987), 139–172 (1987) DOI10.1007/BF00114265
  11. Joachims T., Text Categorization with Support Vector Machines: Learning with Many Relevant Features, Technical Report, University of Dortmund, Dept. of Informatics, Dortmund, Germany 1997 
  12. Koller D., Sahami M., Hierarchically classifying documents using a very few words, In: International Conference on Machine Learning, Volume 14, San Mateo, CA, Morgan-Kaufmann 1997 
  13. Korfhage R., Information Storage and Retrieval, Wiley, New York 1997 
  14. Larsen H. L., Yager R. R., 10.1109/21.214765, IEEE Trans. on Systems, Man, and Cybernetics 23 (1993), 1, 31–40 (1993) DOI10.1109/21.214765
  15. Lewis D. D., Ringuette M., A comparison of two learning algorithms for text classification, In: Proc. Third Annual Symposium on Document Analysis and Information Retrieval, 1994, pp. 81–93 (1994) 
  16. McCallum A., Rosenfeld R., Mitchell, T., Ng A., Improving text classification by shrinkage in a hierarchy of classes, In: Proceedings of ICML-98, 1998. http://www-2.cs.cmu.edu/mccallum/papers/hier-icml98.ps.gz (1998) 
  17. Mitchell T. M., Machine Learning, McGraw Hill, New York 1996 Zbl0913.68167
  18. Miyamoto S., Fuzzy Sets in Information Retrieval and Cluster Analysis, (Number 4 in Theory and Decision Library D: System Theory, Knowledge Engineering and Problem Solving.) Kluwer, Dordrecht 1990 Zbl0716.68030MR1060316
  19. Mladenić D., Grobelnik M., Feature selection for classification based on text hierarchy, In: Working Notes of Learning from Web, Conference on Automated Learning and Discovery (CONALD), 1998 
  20. Nigam K., McCallum A., Thrun, S., Mitchell T., Learning to classify text from labeled and unlabeled documents, In: Proc. 15th National Conference on Artifical Intelligence (AAAI-98), 1998 
  21. Radecki T., 10.1016/0306-4573(79)90031-1, Information Processing and Management 15 (1979), 5, 247–259 (1979) Zbl0413.68101DOI10.1016/0306-4573(79)90031-1
  22. Ruspini E. H., Bonissone P. P., (eds.) W. Pedrycz, Handbook of Fuzzy Computation, Oxford University Press and Institute of Physics Publishing, Bristol and Philadelphia 1998 Zbl0902.68068MR1668348
  23. Salton G., Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer, Addision-Wesley, Reading, MA 1989 
  24. Salton G., McGill M. J., An Introduction to Modern Information Retrieval, McGraw-Hill, New York 1983 
  25. Sebastiani F., 10.1145/505282.505283, ACM Computing Surveys 34 (2002), 1, 1–47 DOI10.1145/505282.505283
  26. Rijsbergen C. J. van, Information Retrieval, Second edition. Butterworths, London 1979. http://www.dcs.gla.ac.uk/Keith (1979) 
  27. Weiss S. M., Apte C., Damerau F. J., Johnson D. E., Oles F. J., Goetz, T., Hampp T., Maximizing text-mining performance, IEEE Intelligent Systems 14 (1999), 4, 2–8 (1999) 
  28. Wiener E., Pedersen J. O., Weigend A. S., A neural network approach to topic spotting, In: Proc. 4th Annual Symposium on Document Analysis and Information Retrieval, pages 22–34, 1993 
  29. Yang Y., 10.1023/A:1009982220290, Information Retrieval 1 (1999), 1–2, 69–90. http://citeseer.nj.nec.com/yang97evaluation.html (1999) DOI10.1023/A:1009982220290

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.