Statistical tools for discovering pseudo-periodicities in biological sequences

Bernard Prum; Élisabeth de Turckheim; Martin Vingron

ESAIM: Probability and Statistics (2010)

  • Volume: 5, page 171-181
  • ISSN: 1292-8100

Abstract

top

Many protein sequences present non trivial periodicities, such as cysteine signatures and leucine heptads. These known periodicities probably represent a small percentage of the total number of sequences periodic structures, and it is useful to have general tools to detect such sequences and their period in large databases of sequences. We compare three statistics adapted from those used in time series analysis: a generalisation of the simple autocovariance based on a similarity score and two statistics intending to increase the power of the method. Theoretical behaviour of these statistics are derived, and the corresponding tests are then described. In this paper we also present an application of these tests to a protein known to have sequence periodicity.

How to cite

top

Prum, Bernard, de Turckheim, Élisabeth, and Vingron, Martin. "Statistical tools for discovering pseudo-periodicities in biological sequences." ESAIM: Probability and Statistics 5 (2010): 171-181. <http://eudml.org/doc/197759>.

@article{Prum2010,
abstract = {
Many protein sequences present non trivial periodicities, such as cysteine signatures and leucine heptads. These known periodicities probably represent a small percentage of the total number of sequences periodic structures, and it is useful to have general tools to detect such sequences and their period in large databases of sequences. We compare three statistics adapted from those used in time series analysis: a generalisation of the simple autocovariance based on a similarity score and two statistics intending to increase the power of the method. Theoretical behaviour of these statistics are derived, and the corresponding tests are then described. In this paper we also present an application of these tests to a protein known to have sequence periodicity. },
author = {Prum, Bernard, de Turckheim, Élisabeth, Vingron, Martin},
journal = {ESAIM: Probability and Statistics},
keywords = {Biological sequences; proteins; periodicity; autocovariance funtion.},
language = {eng},
month = {3},
pages = {171-181},
publisher = {EDP Sciences},
title = {Statistical tools for discovering pseudo-periodicities in biological sequences},
url = {http://eudml.org/doc/197759},
volume = {5},
year = {2010},
}

TY - JOUR
AU - Prum, Bernard
AU - de Turckheim, Élisabeth
AU - Vingron, Martin
TI - Statistical tools for discovering pseudo-periodicities in biological sequences
JO - ESAIM: Probability and Statistics
DA - 2010/3//
PB - EDP Sciences
VL - 5
SP - 171
EP - 181
AB - 
Many protein sequences present non trivial periodicities, such as cysteine signatures and leucine heptads. These known periodicities probably represent a small percentage of the total number of sequences periodic structures, and it is useful to have general tools to detect such sequences and their period in large databases of sequences. We compare three statistics adapted from those used in time series analysis: a generalisation of the simple autocovariance based on a similarity score and two statistics intending to increase the power of the method. Theoretical behaviour of these statistics are derived, and the corresponding tests are then described. In this paper we also present an application of these tests to a protein known to have sequence periodicity.
LA - eng
KW - Biological sequences; proteins; periodicity; autocovariance funtion.
UR - http://eudml.org/doc/197759
ER -

References

top
  1. P. Argos, Evidence for a repeating domain in type I restriction enzyme. European Molecular Biology Organization J.4 (1985) 1351-1355.  
  2. G. Benson and M.S. Waterman, A method for fast data search for all k-nucleotide repeats. Nucleic Acids Res.20 (1994) 2019-2022.  
  3. M.S.M. Boguski, R.C. Hardison, S. Schwart and W. Miller, Analysis of conserved domains and sequence motifs in cellular regulatory proteins and locus control using new software tools for multiple alignments and visualization. The New Biologist4 (1992) 247-260.  
  4. G.M. Bressan, P. Argos and K.K. Stanley, Repeating structure of chick tropoelastin revealed by complementary DNA cloning. Biochemistry26 (1987) 1497-1503.  
  5. P.J. Brockwell and R.A. Davis, Time Series: Theory and Methods. Springer-Verlag (1987).  Zbl0604.62083
  6. R.S. Brown, C. Sander and P. Argos, The primary structure of transcription factor TF III A has 12 consecutive repeats. Federation of European Biochemical Society Letter186 (1985) 271-274.  
  7. J.L. Cornette, K.B. Cease, H. Margalit, J.L. Sponge, J.A. Berzofsky and Ch. DeLisi, Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. J. Molecular Biology195 (1987) 659-685.  
  8. E. Coward, Detecting periodicity pattern in biological sequences. Bioinformatics14-6 (1998) 498-507.  
  9. M.O. Dayhoff, R. Schwartz and B.C. Orcutt, A model of evolutionary change in protein, edited by M.O. Dayhoff. National Biomedical Research Foundation, Washington D.C., Atlas of Protein Sequences and Structure 5-3 (1978) 345-352.  
  10. P. Doukhan, Mixing, properties and examples. Springer Verlag, Lecture Notes in Statist. 85 (1985).  Zbl0801.60027
  11. V.A. Fischetti, G.M. Landau and P.H. Seller, Identifying period occurences of a template with application to protein structure. Inform. Process. Lett.45-1 (1993) 11-18.  
  12. W. Fitch, Phylogenies constrained by cross-over process as illustrated by human hemoglobins an a thirteen-cycle, eleven amino-acid repeat in human apolipoprotein AI. Genetics86 (1977) 623-644.  
  13. S. Hennikoff and J.G. Henikoff, Amino acid substitution matrices from protein blocks for database research. Nucleid Acid Res.19 (1992) 6565-6572.  
  14. J. Heringa and P.Argos, A method to recognize distant repeats in protein sequences. Proteins17-4 (1993) 391-441.  
  15. I.A. Ibragimov, On a central limit theorem for dependent random variables. Theory Probab. Appl.15 (1975).  
  16. S. Labeit, M. Gautel, A. Lakey and J. Trinick, Towards a molecular understanding of titin. European Molecular Biology Organization J.11 (1992) 1711-1716.  
  17. A. Lupas, M. van Dyke and J. Stock, Predicting coiled coils from protein sequences. Science252 (1991) 1162-1164.  
  18. A.D. McLachlan, Analysis of periodic patterns in amino-acid sequences: Collagen. Biopolymers16 (1977) 1271-1297.  
  19. A.D. McLachlan, Repeated helical patterns in apolipoprotein AI. Nature267 (1977) 465-466.  
  20. A.D. McLachlan and J. Karn, Periodic features in the amino-acid sequence of nematod myosin rod. J. Molecular Biology220 (1983) 79-88.  
  21. A.D. McLachlan and M. Stewart, The 14-fold periodicity in alpha-tropomyosin and the interaction with actin. J. Molecular Biology103 (1976) 271-298.  
  22. A.D. McLachlan, M. Stewart, R.O. Hynes and D.J. Rees, Analysis of repeated motifs in talin rod. J. Molecular Biology235-4 (1994) 1278-1290.  
  23. J. Miller, A.D. McLachlan and A. Klug, Repetitive zinc-binding domains in the transcription factor IIIA from Xenopus oocytes. European Molecular Biology Organization J.4 (1985) 1609-1614.  
  24. R.J. Serfling, Approximation Theorems of mathematical statistics. Wiley (1980).  Zbl0538.62002

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.