Significance tests to identify regulated proteins based on a large number of small samples

Frank Klawonn

Kybernetika (2012)

  • Volume: 48, Issue: 3, page 478-493
  • ISSN: 0023-5954

Abstract

top
Modern biology is interested in better understanding mechanisms within cells. For this purpose, products of cells like metabolites, peptides, proteins or mRNA are measured and compared under different conditions, for instance healthy cells vs. infected cells. Such experiments usually yield regulation or expression values – the abundance or absence of a cell product in one condition compared to another one – for a large number of cell products, but with only a few replicates. In order to distinguish random fluctuations and noise from true regulations, suitable significance tests are needed. Here we propose a simple model which is based on the assumption that the regulation factors follow normal distributions with different expected values, but with the same standard deviation. Before suitable significance tests can be derived from this model, a reliable estimation for the standard deviation in the context of many small samples is needed. We therefore also include a discussion on the properties of the sample MAD (Median Absolute Deviation from the median) and the sample standard deviation for small samples sizes.

How to cite

top

Klawonn, Frank. "Significance tests to identify regulated proteins based on a large number of small samples." Kybernetika 48.3 (2012): 478-493. <http://eudml.org/doc/246400>.

@article{Klawonn2012,
abstract = {Modern biology is interested in better understanding mechanisms within cells. For this purpose, products of cells like metabolites, peptides, proteins or mRNA are measured and compared under different conditions, for instance healthy cells vs. infected cells. Such experiments usually yield regulation or expression values – the abundance or absence of a cell product in one condition compared to another one – for a large number of cell products, but with only a few replicates. In order to distinguish random fluctuations and noise from true regulations, suitable significance tests are needed. Here we propose a simple model which is based on the assumption that the regulation factors follow normal distributions with different expected values, but with the same standard deviation. Before suitable significance tests can be derived from this model, a reliable estimation for the standard deviation in the context of many small samples is needed. We therefore also include a discussion on the properties of the sample MAD (Median Absolute Deviation from the median) and the sample standard deviation for small samples sizes.},
author = {Klawonn, Frank},
journal = {Kybernetika},
keywords = {MAD; standard deviation; small samples; significance test; MAD; standard deviation; small samples; significance test},
language = {eng},
number = {3},
pages = {478-493},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Significance tests to identify regulated proteins based on a large number of small samples},
url = {http://eudml.org/doc/246400},
volume = {48},
year = {2012},
}

TY - JOUR
AU - Klawonn, Frank
TI - Significance tests to identify regulated proteins based on a large number of small samples
JO - Kybernetika
PY - 2012
PB - Institute of Information Theory and Automation AS CR
VL - 48
IS - 3
SP - 478
EP - 493
AB - Modern biology is interested in better understanding mechanisms within cells. For this purpose, products of cells like metabolites, peptides, proteins or mRNA are measured and compared under different conditions, for instance healthy cells vs. infected cells. Such experiments usually yield regulation or expression values – the abundance or absence of a cell product in one condition compared to another one – for a large number of cell products, but with only a few replicates. In order to distinguish random fluctuations and noise from true regulations, suitable significance tests are needed. Here we propose a simple model which is based on the assumption that the regulation factors follow normal distributions with different expected values, but with the same standard deviation. Before suitable significance tests can be derived from this model, a reliable estimation for the standard deviation in the context of many small samples is needed. We therefore also include a discussion on the properties of the sample MAD (Median Absolute Deviation from the median) and the sample standard deviation for small samples sizes.
LA - eng
KW - MAD; standard deviation; small samples; significance test; MAD; standard deviation; small samples; significance test
UR - http://eudml.org/doc/246400
ER -

References

top
  1. Anders, S., Huber, W., 10.1186/gb-2010-11-10-r106, Genome Biology 11 (2010), R106. DOI10.1186/gb-2010-11-10-r106
  2. Benjamini, Y., Hochberg, Y., Controlling the false discovery rate: A practical and powerful approach to multiple testing, J. Roy. Statist. Soc. Ser. B (Methodological) 57 (1995), 289–300. Zbl0809.62014MR1325392
  3. Berrar, D. P., Dubitzky, M., Granzow, M., eds., A Practical Approach to Microarray Data Analysis, Springer, Dordecht 2009. 
  4. Breitwieser, F. P., Müller, A., Dayon, L., Köcher, T., Hainard, A., Pichler, P., Schmidt-Erfurth, U., Superti-Furga, G., Sanchez, J.-C., Mechtler, K., Bennett, K. L., Colinge, J., 10.1021/pr1012784, J. Proteome Res. 10 (2011), 2758–2766. DOI10.1021/pr1012784
  5. Croux, C., Rousseuw, P. J., Alternatives to the median absolute deviation, In: Computational Statistics (Y. Dodge J. and Whittaker, eds.), Physica 1, Heidelberg 1992, pp. 411–428. 
  6. Gentleman, R., Carey, V., Huber, W., Irizarry, R., Dudoit, S., Bioinformatics and Computational Biology Solutions Using R and Bioconductor, Springer, New York 2005. Zbl1142.62100MR2201836
  7. Holm, S., A simple sequentially rejective multiple test procedure, Scand. J. Statist. 6 (1979), 65–70. Zbl0402.62058MR0538597
  8. Hundertmark, C., Fischer, R., Reinl, T., May, S., Klawonn, F., Jänsch, J., 10.1093/bioinformatics/btn551, Bioinformatics 25 (2009), 1004–1011. DOI10.1093/bioinformatics/btn551
  9. Klawonn, F., Hundertmark, C., Jänsch, L., A maximum likelihood approach to noise estimation for intensity measurements in biology, In: Proc. Sixth IEEE International Conference on Data Mining: Workshops (S. Tsumoto, C. W. Clifton, N. Zhong, X. Wu, J. Liu, B. W. Wah, and Y.-M. Cheung, eds.), IEEE, Los Alamitos 2006, pp. 180–184. 
  10. Klawonn, F., Wüstefeld, T., Zender, L., Statistical modelling for data from experiments with short hairpin RNAs, In: Advances in Intelligent Data Analysis IX, Springer, Berlin 2010, pp. 79–90. 
  11. Development Core Team, R., R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna 2009, http://www.R-project.org. 
  12. Robinson, M. D., Oshlack, A., 10.1186/gb-2010-11-3-r25, Genome Biology 11 (2010), R25. DOI10.1186/gb-2010-11-3-r25
  13. Rousseuw, P. J., Croux, C., 10.1080/01621459.1993.10476408, J. Amer. Statist. Assoc. 88 (1993), 1273–1283. MR1245360DOI10.1080/01621459.1993.10476408
  14. Shaffer, J. P., 10.1146/annurev.ps.46.020195.003021, Ann. Rev. Psych. 46 (1995), 561–584. DOI10.1146/annurev.ps.46.020195.003021
  15. Smyth, G. K., LIMMA: Linear models for microarray data, In: Bioinformatics and Computational Biology Solutions using R and Bioconductor (R. Gentleman, V. Carey, W. Huber, R. Irizarry, and S. Dudoit, eds.), Springer, New York 2005, pp. 397–420. MR2201836

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.