Multi-label classification using error correcting output codes

Tomasz Kajdanowicz; Przemysław Kazienko

International Journal of Applied Mathematics and Computer Science (2012)

  • Volume: 22, Issue: 4, page 829-840
  • ISSN: 1641-876X

Abstract

top
A framework for multi-label classification extended by Error Correcting Output Codes (ECOCs) is introduced and empirically examined in the article. The solution assumes the base multi-label classifiers to be a noisy channel and applies ECOCs in order to recover the classification errors made by individual classifiers. The framework was examined through exhaustive studies over combinations of three distinct classification algorithms and four ECOC methods employed in the multi-label classification problem. The experimental results revealed that (i) the Bode-Chaudhuri-Hocquenghem (BCH) code matched with any multi-label classifier results in better classification quality; (ii) the accuracy of the binary relevance classification method strongly depends on the coding scheme; (iii) the label power-set and the RAkEL classifier consume the same time for computation irrespective of the coding utilized; (iv) in general, they are not suitable for ECOCs because they are not capable to benefit from ECOC correcting abilities; (v) the all-pairs code combined with binary relevance is not suitable for datasets with larger label sets.

How to cite

top

Tomasz Kajdanowicz, and Przemysław Kazienko. "Multi-label classification using error correcting output codes." International Journal of Applied Mathematics and Computer Science 22.4 (2012): 829-840. <http://eudml.org/doc/244515>.

@article{TomaszKajdanowicz2012,
abstract = {A framework for multi-label classification extended by Error Correcting Output Codes (ECOCs) is introduced and empirically examined in the article. The solution assumes the base multi-label classifiers to be a noisy channel and applies ECOCs in order to recover the classification errors made by individual classifiers. The framework was examined through exhaustive studies over combinations of three distinct classification algorithms and four ECOC methods employed in the multi-label classification problem. The experimental results revealed that (i) the Bode-Chaudhuri-Hocquenghem (BCH) code matched with any multi-label classifier results in better classification quality; (ii) the accuracy of the binary relevance classification method strongly depends on the coding scheme; (iii) the label power-set and the RAkEL classifier consume the same time for computation irrespective of the coding utilized; (iv) in general, they are not suitable for ECOCs because they are not capable to benefit from ECOC correcting abilities; (v) the all-pairs code combined with binary relevance is not suitable for datasets with larger label sets.},
author = {Tomasz Kajdanowicz, Przemysław Kazienko},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {machine learning; supervised learning; multi-label classification; error-correcting output codes; ECOC; ensemble methods; binary relevance; framework; AdaBoostSeq; structured output learning; structured output prediction; structured prediction; sequence prediction; sequential output prediction; ensemble method; classifier fusion; classification},
language = {eng},
number = {4},
pages = {829-840},
title = {Multi-label classification using error correcting output codes},
url = {http://eudml.org/doc/244515},
volume = {22},
year = {2012},
}

TY - JOUR
AU - Tomasz Kajdanowicz
AU - Przemysław Kazienko
TI - Multi-label classification using error correcting output codes
JO - International Journal of Applied Mathematics and Computer Science
PY - 2012
VL - 22
IS - 4
SP - 829
EP - 840
AB - A framework for multi-label classification extended by Error Correcting Output Codes (ECOCs) is introduced and empirically examined in the article. The solution assumes the base multi-label classifiers to be a noisy channel and applies ECOCs in order to recover the classification errors made by individual classifiers. The framework was examined through exhaustive studies over combinations of three distinct classification algorithms and four ECOC methods employed in the multi-label classification problem. The experimental results revealed that (i) the Bode-Chaudhuri-Hocquenghem (BCH) code matched with any multi-label classifier results in better classification quality; (ii) the accuracy of the binary relevance classification method strongly depends on the coding scheme; (iii) the label power-set and the RAkEL classifier consume the same time for computation irrespective of the coding utilized; (iv) in general, they are not suitable for ECOCs because they are not capable to benefit from ECOC correcting abilities; (v) the all-pairs code combined with binary relevance is not suitable for datasets with larger label sets.
LA - eng
KW - machine learning; supervised learning; multi-label classification; error-correcting output codes; ECOC; ensemble methods; binary relevance; framework; AdaBoostSeq; structured output learning; structured output prediction; structured prediction; sequence prediction; sequential output prediction; ensemble method; classifier fusion; classification
UR - http://eudml.org/doc/244515
ER -

References

top
  1. Boutell, M.R., Luo, J., Shen, X. and Brown, C.M. (2004). Learning multi-label scene classification, Pattern Recognition 37(9): 1757-1771. 
  2. Clare, A. and King, R.D. (2001). Knowledge discovery in multi-label phenotype data, in L.D. Raedt and A. Siebes (Eds.), PKDD: 5th European Conference on Machine Learning and Knowledge Discovery, Lecture Notes in Computer Science, Vol. 2168, Springer, Berlin/Heidelberg, pp. 42-53. Zbl1009.68730
  3. Crammer, K. and Singer, Y. (2003). A family of additive online algorithms for category ranking, Journal of Machine Learning Research 3: 1025-1058. Zbl1061.68543
  4. Dietterich, T.G. and Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes, Journal of Artificial Intelligence Research 2: 263-286. Zbl0900.68358
  5. Diplaris, S., Tsoumakas, G., Mitkas, P. and Vlahavas, I. (2005). Protein classification with multiple algorithms, in P. Bozanis and E.N. Houstis (Eds.), 10th Panhelllenic Conference on Informatics (PCI 2005), Lecture Notes in Computer Science, Vol. 3746, Springer-Verlag, Berlin/Heidelberg, pp. 448-456. 
  6. Duan, K., Keerthi, S.S., Chu, W., Shevade, S.K. and Poo, A.N. (2003). Multi-Category Classification by Soft-Max Combination of Binary Classifiers, Lecture Notes in Computer Science, Vol. 2709, Springer, Berlin/Heidelberg. Zbl1040.68617
  7. Elisseeff, A. and Weston, J. (2001). A kernel method for multi-labelled classification, in T.G. Dietterich, S. Becker and Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems 14, MIT Press, Cambridge, MA, pp. 681-687. 
  8. Ferng, C.-S. and Lin, H.-T. (2011). Multi-label classification with error-correcting codes, Journal of Machine Learning Research 20: 281-295. 
  9. Ghamrawi, N. and McCallum, A. (2005). Collective multi-label classification, in O. Herzog, H.-J. Schek, N. Fuhr, A. Chowdhury and W. Teiken (Eds.), International Conference on Information and Knowledge Management, CIKM, ACM, New York, NY, pp. 195-200. 
  10. Hong, J., Min, J., Cho, U. and Cho, S. (2008). Fingerprint classification using one-vs-all support vector machines dynamically ordered with naive Bayes classifiers, Pattern Recognition 41(2): 662-671. Zbl1131.68513
  11. Hullermeier, E., Furnkranz, J., Cheng, W. and Brinker, K. (2008). Label ranking by learning pairwise preferences, Artificial Intelligence 172(16-17): 1897-1916. Zbl1184.68403
  12. Jankowski, N. (2012). Graph-based generation of a meta-learning search space. International Journal of Applied Mathematics and Computer Science 22(3): 647-667, DOI: 10.2478/v10006-012-0049-y. 
  13. Kajdanowicz, T. and Kazienko, P. (2009a). Hybrid repayment prediction for debt portfolio, in N.T. Nguyen, R. Kowalczyk and S.-M. Chen (Eds.), Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems, Lecture Notes in Artificial Intelligence, Vol. 5796, Springer, Berlin/Heidelberg, pp. 850-857. 
  14. Kajdanowicz, T. and Kazienko, P. (2009b). Prediction of sequential values for debt recovery, in E. Bayro-Corrochano and J.-O. Eklundh (Eds.), Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, Lecture Notes in Computer Science, Vol. 5856, Springer, Berlin/Heidelberg, pp. 337-344. 
  15. Kajdanowicz, T., Wozniak, M. and Kazienko, P. (2011). Multiple classifier method for structured output prediction based on error correcting output codes, in N. Nguyen, C.-G. Kim and A. Janiak (Eds.), Intelligent Information and Database Systems, Lecture Notes in Computer Science, Vol. 6592, Springer, Berlin/Heidelberg, pp. 333-342. 
  16. Kuncheva, L.I. (2005). Using diversity measures for generating error-correcting output codes in classifier ensembles, Pattern Recognition Letters 26(1): 83-90. 
  17. Kuriata, E. (2008). Creation of unequal error protection codes for two groups of symbols, International Journal of Applied Mathematics and Computer Science 18(2): 251-257, DOI: 10.2478/v10006-008-0023-x. Zbl1245.94111
  18. Loza Mencia, E. and Furnkranz, J. (2008). Pairwise learning of multilabel classifications with perceptrons, Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN-08), Hong Kong, China, pp. 2900-2907. 
  19. Mackay, D.J.C. (2003). Information Theory, Inference, and Learning Algorithms, Cambridge University Press, Cambridge. Zbl1055.94001
  20. Morelos-Zaragoza, R. (2006). The Art of Error Correcting Coding, Wiley, West Sussex. 
  21. Pestian, J., Brew, C., Matykiewicz, P., Hovermale, D., Johnson, N., Bretonnel Cohen, K. and Duch, W. (2007). A shared task involving multi-label classification of clinical free text, Proceedings of ACL BioNLP, Association of Computational Linguistics, Stroudsburg, PA. 
  22. Read, J., Pfahringer, B., Holmes, G. and Frank, E. (2009). Classifier chains for multi-label classification, 13th European Conference on Principles and Practice of Knowledge Discovery in Databases/20th European Conference on Machine Learning, Bled, Slovenia, pp. 254-269. 
  23. Read, J., Pfahringer, B., Holmes, G. and Frank, E. (2011). Classifier chains for multi-label classification, Machine Learning 85(3): 333-359. 
  24. Reed, I.S. and Chen, X. (1999). Error-Control Coding for Data Networks, Kluwer Academic Publishers, Norwell, MA. 
  25. Sammut, C. and Webb, G.I. (2011). Encyclopedia of Machine Learning, Springer, Berlin/Heidelberg. Zbl1211.68001
  26. Schapire, R.E. and Singer, Y. (2000). Boostexter: A boosting-based system for text categorization, Machine Learning 39(2/3): 135-168. Zbl0951.68561
  27. Trohidis, K., Tsoumakas, G., Kalliris, G. and Vlahavas, I. (2008). Multilabel classification of music into emotions, 9th International Conference on Music Information Retrieval (ISMIR 2008), Philadelphia, PA, USA, pp. 325-330. 
  28. Tsoumakas, G., Katakis, I. and Vlahavas, I. (2011). Random k-labelsets for multilabel classification, IEEE Transactions on Knowledge and Data Engineering 23(7): 1079-1089. 
  29. Tsoumakas, G. and Vlahavas, I. (2007). Random k-labelsets: An Ensemble Method for Multilabel Classification, Lecture Notes in Artificial Intelligence, Vol. 4701, Springer, Berlin/Heidelberg. 
  30. Zhang, M.-L. and Zhou, Z.-H. (2006). Multilabel neural networks with applications to functional genomics and text categorization, IEEE Transactions on Knowledge and Data Engineering 18(10): 1338-1351. 
  31. Zhang, M. and Zhou, Z. (2007). ML-KNN: A lazy learning approach to multi-label learning, Pattern Recognition 40(7): 2038-2048. Zbl1111.68629
  32. Zhang, Y. and Schneider, J. (2011). Multi-label output codes using canonical correlation analysis, Journal of Machine Learning Research 15: 873-882. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.