The use of information and information gain in the analysis of attribute dependencies

Krzysztof Moliński; Anita Dobek; Kamila Tomaszyk

Biometrical Letters (2012)

  • Volume: 49, Issue: 2, page 149-158
  • ISSN: 1896-3811

Abstract

top
This paper demonstrates the possible conclusions which can be drawn from an analysis of entropy and information. Because of its universality, entropy can be widely used in different subjects, especially in biomedicine. Based on simulated data the similarities and differences between the grouping of attributes and testing of their independencies are shown. It follows that a complete exploration of data sets requires both of these elements. A new concept introduced in this paper is that of normed information gain, allowing the use of any logarithm in the definition of entropy.

How to cite

top

Krzysztof Moliński, Anita Dobek, and Kamila Tomaszyk. "The use of information and information gain in the analysis of attribute dependencies." Biometrical Letters 49.2 (2012): 149-158. <http://eudml.org/doc/268905>.

@article{KrzysztofMoliński2012,
abstract = {This paper demonstrates the possible conclusions which can be drawn from an analysis of entropy and information. Because of its universality, entropy can be widely used in different subjects, especially in biomedicine. Based on simulated data the similarities and differences between the grouping of attributes and testing of their independencies are shown. It follows that a complete exploration of data sets requires both of these elements. A new concept introduced in this paper is that of normed information gain, allowing the use of any logarithm in the definition of entropy.},
author = {Krzysztof Moliński, Anita Dobek, Kamila Tomaszyk},
journal = {Biometrical Letters},
keywords = {dendrogram; entropy; information gain},
language = {eng},
number = {2},
pages = {149-158},
title = {The use of information and information gain in the analysis of attribute dependencies},
url = {http://eudml.org/doc/268905},
volume = {49},
year = {2012},
}

TY - JOUR
AU - Krzysztof Moliński
AU - Anita Dobek
AU - Kamila Tomaszyk
TI - The use of information and information gain in the analysis of attribute dependencies
JO - Biometrical Letters
PY - 2012
VL - 49
IS - 2
SP - 149
EP - 158
AB - This paper demonstrates the possible conclusions which can be drawn from an analysis of entropy and information. Because of its universality, entropy can be widely used in different subjects, especially in biomedicine. Based on simulated data the similarities and differences between the grouping of attributes and testing of their independencies are shown. It follows that a complete exploration of data sets requires both of these elements. A new concept introduced in this paper is that of normed information gain, allowing the use of any logarithm in the definition of entropy.
LA - eng
KW - dendrogram; entropy; information gain
UR - http://eudml.org/doc/268905
ER -

References

top
  1. Bezzi M. (2007): Quantifying the information transmitted in a single stimulus. Biosystems, 89: 4-9.[Crossref][WoS] 
  2. Brunsell N.A. (2010): A multiscale information theory approach to assess spatial- temporal variability of daily precipitation. Journal of Hydrology 385: 165-172.[WoS] 
  3. Jakulin A. (2005). Machine learning based on attribute informations. PhD Dissertation. University of Ljubljana. 
  4. Jakulin A., Bratko I., Smrke D., Demsar J., Zupan B. (2003): Attribute interactions in medical data analysis. In: 9th Conference on Artificial Intelligence in Medicine in Europe (AIME 2003), October 18-22, (2003), Protaras, Cyprus. 
  5. Jakulin A., Bratko I. (2003): Analyzing attribute dependencies. In: 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003), September 22-26, Cavtat, Croatia. 
  6. Jakulin A., Bratko I. (2004a): Quantifying and visualizing attribute interactions: an approach based on entropy. http://arxiv.org/abs/cs.AI/0308002v3. 
  7. Jakulin A., Bratko I. (2004b): Testing the significance of attribute interaction. Proc. 21st International Conference on Machine Learning. Banff, Canada. 
  8. Kang G., Yue W., Zhang J., Cui Y., Zuo Y., Zhang D. (2008): An entropy-based approach for testing genetic epistasis underlying complex diseases. Journal of Theoretical Biology 250: 362-374.[WoS] 
  9. Kullback S., Leibler R.A. (1951): On information and sufficiency. Annals of Mathematical Statistics 22(1): 79-86.[Crossref] Zbl0042.38403
  10. Matsuda H. (2000): Physical nature of higher-order mutual information. Intrinsic correlation and frustration. Physical Review E, 62: 3096-3102. 
  11. McGill W.J. (1954): Multivariate information transmission. Psychometrika 19(2): 97-116.[Crossref] Zbl0058.35706
  12. Moniz L.J., Cooch E.G., Ellner S.P., Nichols J.D., Nichols J.M. (2007): Application of information theory methods to food web reconstruction. Ecological Modeling 208: 145-158. 
  13. Moore J.H., Gilbert J.C., Tsai C.-T., Chiang F.-T., Holden T., Barney N., White B.C. (2006): A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. Journal of Theoretical Biology 241: 252-261. 
  14. Rajski C. (1961): A metric space of discrete probability distributions. Information and Control 4: 373-377. Zbl0103.35805
  15. Shannon C. (1948): A mathematical theory of communication. Bell System Technical Journal 27: 379-423, 623-656. Zbl1154.94303
  16. Yan Z. Wang Z., Xie H. (2008): The application of mutual information-based feature selection and fuzzy LS-SVM-based classifier in motion classification. Computer Methods and Programs in Biomedicine 90: 275-284.[WoS] 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.