Building the library of RNA 3D nucleotide conformations using the clustering approach

Tomasz Zok; Maciej Antczak; Martin Riedel; David Nebel; Thomas Villmann; Piotr Lukasiak; Jacek Blazewicz; Marta Szachniuk

International Journal of Applied Mathematics and Computer Science (2015)

  • Volume: 25, Issue: 3, page 689-700
  • ISSN: 1641-876X

Abstract

top
An increasing number of known RNA 3D structures contributes to the recognition of various RNA families and identification of their features. These tasks are based on an analysis of RNA conformations conducted at different levels of detail. On the other hand, the knowledge of native nucleotide conformations is crucial for structure prediction and understanding of RNA folding. However, this knowledge is stored in structural databases in a rather distributed form. Therefore, only automated methods for sampling the space of RNA structures can reveal plausible conformational representatives useful for further analysis. Here, we present a machine learning-based approach to inspect the dataset of RNA three-dimensional structures and to create a library of nucleotide conformers. A median neural gas algorithm is applied to cluster nucleotide structures upon their trigonometric description. The clustering procedure is two-stage: (i) backbone- and (ii) ribose-driven. We show the resulting library that contains RNA nucleotide representatives over the entire data, and we evaluate its quality by computing normal distribution measures and average RMSD between data points as well as the prototype within each cluster.

How to cite

top

Tomasz Zok, et al. "Building the library of RNA 3D nucleotide conformations using the clustering approach." International Journal of Applied Mathematics and Computer Science 25.3 (2015): 689-700. <http://eudml.org/doc/271779>.

@article{TomaszZok2015,
abstract = {An increasing number of known RNA 3D structures contributes to the recognition of various RNA families and identification of their features. These tasks are based on an analysis of RNA conformations conducted at different levels of detail. On the other hand, the knowledge of native nucleotide conformations is crucial for structure prediction and understanding of RNA folding. However, this knowledge is stored in structural databases in a rather distributed form. Therefore, only automated methods for sampling the space of RNA structures can reveal plausible conformational representatives useful for further analysis. Here, we present a machine learning-based approach to inspect the dataset of RNA three-dimensional structures and to create a library of nucleotide conformers. A median neural gas algorithm is applied to cluster nucleotide structures upon their trigonometric description. The clustering procedure is two-stage: (i) backbone- and (ii) ribose-driven. We show the resulting library that contains RNA nucleotide representatives over the entire data, and we evaluate its quality by computing normal distribution measures and average RMSD between data points as well as the prototype within each cluster.},
author = {Tomasz Zok, Maciej Antczak, Martin Riedel, David Nebel, Thomas Villmann, Piotr Lukasiak, Jacek Blazewicz, Marta Szachniuk},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {RNA nucleotides; conformer library; torsion angles; clustering; neural gas},
language = {eng},
number = {3},
pages = {689-700},
title = {Building the library of RNA 3D nucleotide conformations using the clustering approach},
url = {http://eudml.org/doc/271779},
volume = {25},
year = {2015},
}

TY - JOUR
AU - Tomasz Zok
AU - Maciej Antczak
AU - Martin Riedel
AU - David Nebel
AU - Thomas Villmann
AU - Piotr Lukasiak
AU - Jacek Blazewicz
AU - Marta Szachniuk
TI - Building the library of RNA 3D nucleotide conformations using the clustering approach
JO - International Journal of Applied Mathematics and Computer Science
PY - 2015
VL - 25
IS - 3
SP - 689
EP - 700
AB - An increasing number of known RNA 3D structures contributes to the recognition of various RNA families and identification of their features. These tasks are based on an analysis of RNA conformations conducted at different levels of detail. On the other hand, the knowledge of native nucleotide conformations is crucial for structure prediction and understanding of RNA folding. However, this knowledge is stored in structural databases in a rather distributed form. Therefore, only automated methods for sampling the space of RNA structures can reveal plausible conformational representatives useful for further analysis. Here, we present a machine learning-based approach to inspect the dataset of RNA three-dimensional structures and to create a library of nucleotide conformers. A median neural gas algorithm is applied to cluster nucleotide structures upon their trigonometric description. The clustering procedure is two-stage: (i) backbone- and (ii) ribose-driven. We show the resulting library that contains RNA nucleotide representatives over the entire data, and we evaluate its quality by computing normal distribution measures and average RMSD between data points as well as the prototype within each cluster.
LA - eng
KW - RNA nucleotides; conformer library; torsion angles; clustering; neural gas
UR - http://eudml.org/doc/271779
ER -

References

top
  1. Adamiak, R., Blazewicz, J., Formanowicz, P., Gdaniec, Z., Kasprzak, M., Popenda, M. and Szachniuk, M. (2004). An algorithm for an automatic NOE pathways analysis in 2D NMR spectra of RNA duplexes, Journal of Computational Biology 42(11): 163-180. 
  2. Antczak, M., Zok, T., Popenda, M., Lukasiak, P., Adamiak, R., Blazewicz, J. and Szachniuk, M. (2014). RNApdbee-a webserver to derive secondary structures from PDB files of knotted and unknotted RNAs, Nucleic Acids Research 42(W1): W368-W372. 
  3. Berman, H., Olson, W., Beveridge, D., Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S., Srinivasan, A. and Schneider, B. (1992). The Nucleic Acid Database: A comprehensive relational database of three-dimensional structures of nucleic acids, Biophysical Journal 3(63): 751-759. 
  4. Berman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I.N. and Bourne, P. E. (2000). The Protein Data Bank, Nucleic Acids Research 28(1): 235-42. 
  5. Blazewicz, J., Szachniuk, M. and Wojtowicz, A. (2004). Evolutionary approach to NOE paths assignment in RNA structure elucidation, Proceedings of the 2004 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, La Jolla, CA, USA, Vol. 1, pp. 206-213. 
  6. Cottrell, M., Hammer, B., Hasenfuss, A. and Villmann, T. (2006). Batch and median neural gas, Neural Networks 19(6): 762-771. Zbl1102.68542
  7. Dunbrack, Jr, R. (2002). Rotamer libraries in the 21st century, Current Opinion in Structural Biology 12(4): 431-440. 
  8. Dunbrack, Jr, R. and Karplus, M. (1993). Backbone-dependent rotamer library for proteins. Application to side-chain prediction, Journal of Molecular Biology 230(2): 543-574. 
  9. Frey, B. and Dueck, D. (2007). Clustering by passing messages between data points, Science 315(5814): 972-976. Zbl1226.94027
  10. Hamelryck, T., Kent, J. and Krogh, A. (2006). Sampling realistic protein conformations using local structural bias, PLoS Computational Biology 2(9): e131. 
  11. Humphris-Narayanan, E. and Pyle, A. (2012). Discrete RNA libraries from pseudo-torsional space, Journal of Molecular Biology 421(1): 6-26. 
  12. Kaufman, L. and Rousseeuw, P. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, 1st Edn., Wiley-Interscience, New York, NY. 
  13. Leontis, N. and Westhof, E. (2012). RNA 3D Structure Analysis and Prediction, Springer, Berlin/New York, NY. 
  14. Lloyd, S. (1982). Least squares quantization in PCM, IEEE Transactions on Information Theory 28(2): 129-137. Zbl0504.94015
  15. Lukasiak, P., Antczak, M., Ratajczak, T., Bujnicki, J.M., Szachniuk, M., Popenda, M., Adamiak, R. and Blazewicz, J. (2013). RNAlyzer-novel approach for quality analysis of RNA structural models, Nucleic Acids Research 12(41): 5978-5990. 
  16. Lukasiak, P., Blazewicz, J. and Milostan, M. (2010). Some operations research methods for analyzing protein sequences and structures, Annals of Operations Research 175(1): 9-35. Zbl1185.92045
  17. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations, in L. LeCam and J. Neyman (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics, and Probability, University of California Press, Berkeley, CA, pp. 281-297. Zbl0214.46201
  18. Martinetz, T. and Shulten, K. (1991). A ”neural-gas” network learns topologies, in T. Kohonen et al. (Eds.), Artificial Neural Networks, Elsevier, Amsterdam, pp. 397-402. 
  19. Parisien, M. and Major, F. (2012). Determining RNA three-dimensional structures using low-resolution data, Journal of Structural Biology 179(3): 252-260. 
  20. Pekalska, E. and Duin, R. (2005). The Dissimilarity Representation for Pattern Recognition: Foundations and Applications (Machine Perception and Artificial Intelligence), World Scientific Publishing Co., Inc., River Edge, NJ. Zbl1095.68105
  21. Popenda, L., Bielecki, L., Gdaniec, Z. and Adamiak, R.W. (2009). Structure and dynamics of adenosine bulged RNA duplex reveals formation of the dinucleotide platform in the C:G-A triple, Arkivoc 3: 130-144. 
  22. Popenda, M., Blazewicz, M., Szachniuk, M. and Adamiak, R. (2008). RNA FRABASE version 1.0: An engine with a database to search for the three-dimensional fragments within RNA structures, Nucleic Acids Research 36(1): D386-D391. 
  23. Puszyński, K., Jaksik, R. and Świerniak, A. (2012). Regulation of p53 by siRNA in radiation treated cells: Simulation studies, International Journal of Applied Mathematics and Computer Science 22(4): 1011-1018, DOI: 10.2478/v10006-012-0075-9. Zbl1283.93041
  24. Sabo, K. (2014). Center-based l₁-clustering method, International Journal of Applied Mathematics and Computer Science 24(1): 151-163, DOI: 10.2478/amcs-2014-0012. Zbl1292.62097
  25. Steinhaus, H. (1956). Sur la division des corps matériels en parties, Bulletin de l'Academie Polonaise des Sciences IV(12): 801-804. Zbl0079.16403
  26. Szachniuk, M., Malaczynski, M., Pesch, E., Burke, E. and Blazewicz, J. (2013). MLP accompanied beam search for the resonance assignment problem, Journal of Heuristics 3(19): 443-464. 
  27. Villmann, T. (2005). Neural Maps and Learning Vector Quantization for Data Mining-Theory and Applications, Habilitation thesis, University of Leipzig, Leipzig. 
  28. Villmann, T., Geweniger, T., Kästner, M. and Lange, M. (2012). Fuzzy neural gas for unsupervised vector quantization, in L. Rutkowski et al. (Eds.), Artificial Intelligence and Soft Computing, Lecture Notes in Computer Science, Vol. 7267, Springer, Berlin/Heidelberg, pp. 350-358. 
  29. Villmann, T. and Haase, S. (2011). Divergence based vector quantization, Neural Computation 23(5): 1343-1392. Zbl1216.68225
  30. Volkovich, Z., Barzily, Z. and Morozensky, L. (2008). A statistical model of cluster stability, Pattern Recognition 41(7): 2174-2188. Zbl1138.68519
  31. Weber, G.-W., Defterli, O., Gök, S.Z.A. and Kropat, E. (2011). Modeling, inference and optimization of regulatory networks based on time series data, European Journal of Operational Research 211(1): 1-14. Zbl1221.93024
  32. Zok, T., Popenda, M. and Szachniuk, M. (2014). MCQ4Structures to compute similarity of molecule structures, Central European Journal of Operations Research 22(3): 457-473. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.