Tree compression pushdown automaton

Jan Janoušek; Bořivoj Melichar; Martin Poliak

Kybernetika (2012)

  • Volume: 48, Issue: 3, page 429-452
  • ISSN: 0023-5954

Abstract

top
A new kind of a deterministic pushdown automaton, called a Tree Compression Automaton, is presented. The tree compression automaton represents a complete compressed index of a set of trees for subtrees and accepts all subtrees of given trees. The algorithm for constructing our pushdown automaton is incremental. For a single tree with n nodes, the automaton has at most n + 1 states, its transition function cardinality is at most 4 n and there are 2 n + 1 pushdown store symbols. If hashing is used for storing automaton’s transitions, thus removing a factor of l o g n , the construction of the automaton takes linear time and space with respect to the length n of the input tree(s). Our pushdown automaton construction can also be used for finding all subtree repeats without augmenting the overall complexity.

How to cite

top

Janoušek, Jan, Melichar, Bořivoj, and Poliak, Martin. "Tree compression pushdown automaton." Kybernetika 48.3 (2012): 429-452. <http://eudml.org/doc/247233>.

@article{Janoušek2012,
abstract = {A new kind of a deterministic pushdown automaton, called a Tree Compression Automaton, is presented. The tree compression automaton represents a complete compressed index of a set of trees for subtrees and accepts all subtrees of given trees. The algorithm for constructing our pushdown automaton is incremental. For a single tree with $n$ nodes, the automaton has at most $n+1$ states, its transition function cardinality is at most $4n$ and there are $2n+1$ pushdown store symbols. If hashing is used for storing automaton’s transitions, thus removing a factor of $log n$, the construction of the automaton takes linear time and space with respect to the length $n$ of the input tree(s). Our pushdown automaton construction can also be used for finding all subtree repeats without augmenting the overall complexity.},
author = {Janoušek, Jan, Melichar, Bořivoj, Poliak, Martin},
journal = {Kybernetika},
keywords = {trees; pushdown automata; compression; indexing trees; arbology; deterministic pushdown automata; Tree Compression Automaton; trees; indexing trees; arbology; compression},
language = {eng},
number = {3},
pages = {429-452},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Tree compression pushdown automaton},
url = {http://eudml.org/doc/247233},
volume = {48},
year = {2012},
}

TY - JOUR
AU - Janoušek, Jan
AU - Melichar, Bořivoj
AU - Poliak, Martin
TI - Tree compression pushdown automaton
JO - Kybernetika
PY - 2012
PB - Institute of Information Theory and Automation AS CR
VL - 48
IS - 3
SP - 429
EP - 452
AB - A new kind of a deterministic pushdown automaton, called a Tree Compression Automaton, is presented. The tree compression automaton represents a complete compressed index of a set of trees for subtrees and accepts all subtrees of given trees. The algorithm for constructing our pushdown automaton is incremental. For a single tree with $n$ nodes, the automaton has at most $n+1$ states, its transition function cardinality is at most $4n$ and there are $2n+1$ pushdown store symbols. If hashing is used for storing automaton’s transitions, thus removing a factor of $log n$, the construction of the automaton takes linear time and space with respect to the length $n$ of the input tree(s). Our pushdown automaton construction can also be used for finding all subtree repeats without augmenting the overall complexity.
LA - eng
KW - trees; pushdown automata; compression; indexing trees; arbology; deterministic pushdown automata; Tree Compression Automaton; trees; indexing trees; arbology; compression
UR - http://eudml.org/doc/247233
ER -

References

top
  1. Arbology www pages:, Available on http://www.arbology.org/, March 2012. 
  2. Aho, A. V., Ullman, J. D., The Theory of Parsing, Translation, and Compiling, Prentice-Hall Englewood Cliffs, N.J. 1972. MR0408321
  3. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N., Bourne, P. E., 10.1093/nar/28.1.235, Nucleic Acids Res. 28 (2000), 235–242. DOI10.1093/nar/28.1.235
  4. Bille, P., Pattern Matching in Trees and Strings, Ph.D. Thesis, FIT University of Copenhagen 2008. 
  5. Busatto, G., Lohrey, M., Maneth, S., Grammar-based Tree Compression, Technical Report 2004. 
  6. Cleophas, L., Tree Algorithms. Two Taxonomies and a Toolkit, Ph.D. Thesis, Technische Universiteit Eindhoven 2008. 
  7. Cole, R., Hariharan, R., Indyk, P., Tree pattern matching and subset matching in deterministic l o g 3 -time, SODA (1999), 245–254. 
  8. Comon, H., Dauchet, M., Gilleron, R., Löding, C., Jacquemard, F., Lugiez, D., Tison, S., Tommasi, M., Tree automata techniques and applications, Available on: http://www.grappa.univ-lille3.fr/tata (2007). 
  9. Gecseg, F., Steinby, M., Tree languages, In: Handbook of Formal Languages (G. Rozenberg and A. Salomaa, eds.), Vol. 3 Beyond Words. Springer-Verlag, Berlin 1997, pp. 1–68. MR1470018
  10. Flouri, T., Janoušek, J., Melichar, B., Subtree and tree pattern pushdown automata for trees in prefix notation (2009), Comput. Sci. Inform. Systems 7 (2010), 2, 331–357. 
  11. Hoffmann, Ch., O'Donnell, M., 10.1145/322290.322295, J. Assoc. Comput. Mach. 29 (1982), 1, 68–95. Zbl0477.68067MR0662611DOI10.1145/322290.322295
  12. Hopcroft, J. E., Motwani, R., Ullman, J. D., Introduction to Automata Theory, languages, and Computation. Second edition, Addison-Wesley, Boston 2001. MR0645539
  13. Wu, C. H., Apweiler, R., Bairoch, A., Natale, D. A., Barker, W. C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez et al, R., The Universal Protein Resource (UniProt) – An Expanding Universe of Protein Information, Database issue, Nucleic Acids Research, Oxford University Press, D187–D191 2006. 
  14. Schmidt, A. R., Waas, F., Kersten, M. L., Carey, M. J., Manolescu, I., Busse, R., XMark – A benchmark for XML data management, In: Proc. International Conference on Very Large Data Bases (VLDB), Hong Kong 2002, pp. 974–985. 
  15. Van Der Beek, L., Bouma, G., Malouf, R., Van Noord, G., Rijksuniversiteit Groningen, The Alpino dependency treebank, In: Computational Linguistics in the Netherlands (CLIN) 2002, pp. 1686–1691. 
  16. Welch, T., 10.1109/MC.1984.1659158, Computer 17 (1984), 6, 8–19. DOI10.1109/MC.1984.1659158
  17. Ziv, J., Lempel, A., 10.1109/TIT.1977.1055714, IEEE Trans. Inform. Theory 23 (1977), 3, 337–343. Zbl0379.94010MR0530215DOI10.1109/TIT.1977.1055714

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.