Implementation of directed acyclic word graph

Miroslav Balík

Kybernetika (2002)

  • Volume: 38, Issue: 1, page [91]-103
  • ISSN: 0023-5954

Abstract

top
An effective implementation of a Directed Acyclic Word Graph (DAWG) automaton is shown. A DAWG for a text T is a minimal automaton that accepts all substrings of a text T , so it represents a complete index of the text. While all usual implementations of DAWG needed about 30 times larger storage space than was the size of the text, here we show an implementation that decreases this requirement down to four times the size of the text. The method uses a compression of DAWG elements, i. e. vertices, edges and labels. The construction time of this implementation is linear with respect to the size of the text, a search for a specific pattern is done in a linear time with respect to the size of the pattern. This implementation preserves both good properties of the DAWG automaton.

How to cite

top

Balík, Miroslav. "Implementation of directed acyclic word graph." Kybernetika 38.1 (2002): [91]-103. <http://eudml.org/doc/33569>.

@article{Balík2002,
abstract = {An effective implementation of a Directed Acyclic Word Graph (DAWG) automaton is shown. A DAWG for a text $T$ is a minimal automaton that accepts all substrings of a text $T$, so it represents a complete index of the text. While all usual implementations of DAWG needed about 30 times larger storage space than was the size of the text, here we show an implementation that decreases this requirement down to four times the size of the text. The method uses a compression of DAWG elements, i. e. vertices, edges and labels. The construction time of this implementation is linear with respect to the size of the text, a search for a specific pattern is done in a linear time with respect to the size of the pattern. This implementation preserves both good properties of the DAWG automaton.},
author = {Balík, Miroslav},
journal = {Kybernetika},
keywords = {directed acyclic word graph automaton; string matching; data structures; directed acyclic word graph automaton; string matching; data structures},
language = {eng},
number = {1},
pages = {[91]-103},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Implementation of directed acyclic word graph},
url = {http://eudml.org/doc/33569},
volume = {38},
year = {2002},
}

TY - JOUR
AU - Balík, Miroslav
TI - Implementation of directed acyclic word graph
JO - Kybernetika
PY - 2002
PB - Institute of Information Theory and Automation AS CR
VL - 38
IS - 1
SP - [91]
EP - 103
AB - An effective implementation of a Directed Acyclic Word Graph (DAWG) automaton is shown. A DAWG for a text $T$ is a minimal automaton that accepts all substrings of a text $T$, so it represents a complete index of the text. While all usual implementations of DAWG needed about 30 times larger storage space than was the size of the text, here we show an implementation that decreases this requirement down to four times the size of the text. The method uses a compression of DAWG elements, i. e. vertices, edges and labels. The construction time of this implementation is linear with respect to the size of the text, a search for a specific pattern is done in a linear time with respect to the size of the pattern. This implementation preserves both good properties of the DAWG automaton.
LA - eng
KW - directed acyclic word graph automaton; string matching; data structures; directed acyclic word graph automaton; string matching; data structures
UR - http://eudml.org/doc/33569
ER -

References

top
  1. Adámek J., Coding, MVŠT XXXI, SNTL, Prague 1989 (in Czech) (1989) 
  2. Anderson A., Nilson S., 10.1002/spe.4380250203, Software–Practice and Expirience 25 (1995), 129–141 (1995) DOI10.1002/spe.4380250203
  3. Balík M., String Matching in a Text, Diploma Thesis, CTU, Dept. of Computer Science and Engineering, Prague 1998 
  4. Crochemore M., Rytter W., Text Algorithms, Oxford University Press, New York 1994 Zbl1078.68151MR1307378
  5. Crochemore M., Vérin R., Direct construction of compact directed acyclic word graphs, In: CPM97 (A. Apostolico and J. Hein, eds., Lecture Notes in Computer Science 1264), Springer–Verlag, Berlin 1997, pp. 116–129 (1997) MR1608404
  6. Gonnet G. H., Baeza–Yates R., Handbook of Algorithms and Data Structures, Pascal and C. Addison–Wesley, Wokingham 1991 Zbl0719.68001
  7. Huffman D. A., A method for construction of minimum redundancy codes, Proc. IRE 40 (1952), 9, 1098–1101 (1952) 
  8. Irving R. W., Suffix Binary Search Trees, Technical Report TR-1995-7, Computing Science Department, University of Glasgow 199 
  9. Kärkkäinen J., Suffix cactus: A cross between suffix tree and suffix array, In: Proc. 6th Symposium on Combinatorial Pattern Matching, CPM95, 1995, pp. 191–204 (1995) MR1467515
  10. Kurtz S., 10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O, Software–Practice and Experience 29 (1999), 13, 1149-1171 (1999) DOI10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O
  11. Melichar B., Approximate string matching by finite automata, In: Computer Analysis of Images and Patterns (Lecture Notes in Computer Science 970), Springer–Verlag, Berlin 1995 
  12. Melichar B., Fulltext Systems, Publishing House CTU, Prague 1996 (in Czech) (1996) 
  13. Melichar B., Pattern matching and finite automata, In: Proceedings of the Prague Stringology Club Workshop ’97, Prague 1997 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.