Implementation of directed acyclic word graph
Kybernetika (2002)
- Volume: 38, Issue: 1, page [91]-103
- ISSN: 0023-5954
Access Full Article
topAbstract
topHow to cite
topBalík, Miroslav. "Implementation of directed acyclic word graph." Kybernetika 38.1 (2002): [91]-103. <http://eudml.org/doc/33569>.
@article{Balík2002,
abstract = {An effective implementation of a Directed Acyclic Word Graph (DAWG) automaton is shown. A DAWG for a text $T$ is a minimal automaton that accepts all substrings of a text $T$, so it represents a complete index of the text. While all usual implementations of DAWG needed about 30 times larger storage space than was the size of the text, here we show an implementation that decreases this requirement down to four times the size of the text. The method uses a compression of DAWG elements, i. e. vertices, edges and labels. The construction time of this implementation is linear with respect to the size of the text, a search for a specific pattern is done in a linear time with respect to the size of the pattern. This implementation preserves both good properties of the DAWG automaton.},
author = {Balík, Miroslav},
journal = {Kybernetika},
keywords = {directed acyclic word graph automaton; string matching; data structures; directed acyclic word graph automaton; string matching; data structures},
language = {eng},
number = {1},
pages = {[91]-103},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Implementation of directed acyclic word graph},
url = {http://eudml.org/doc/33569},
volume = {38},
year = {2002},
}
TY - JOUR
AU - Balík, Miroslav
TI - Implementation of directed acyclic word graph
JO - Kybernetika
PY - 2002
PB - Institute of Information Theory and Automation AS CR
VL - 38
IS - 1
SP - [91]
EP - 103
AB - An effective implementation of a Directed Acyclic Word Graph (DAWG) automaton is shown. A DAWG for a text $T$ is a minimal automaton that accepts all substrings of a text $T$, so it represents a complete index of the text. While all usual implementations of DAWG needed about 30 times larger storage space than was the size of the text, here we show an implementation that decreases this requirement down to four times the size of the text. The method uses a compression of DAWG elements, i. e. vertices, edges and labels. The construction time of this implementation is linear with respect to the size of the text, a search for a specific pattern is done in a linear time with respect to the size of the pattern. This implementation preserves both good properties of the DAWG automaton.
LA - eng
KW - directed acyclic word graph automaton; string matching; data structures; directed acyclic word graph automaton; string matching; data structures
UR - http://eudml.org/doc/33569
ER -
References
top- Adámek J., Coding, MVŠT XXXI, SNTL, Prague 1989 (in Czech) (1989)
- Anderson A., Nilson S., 10.1002/spe.4380250203, Software–Practice and Expirience 25 (1995), 129–141 (1995) DOI10.1002/spe.4380250203
- Balík M., String Matching in a Text, Diploma Thesis, CTU, Dept. of Computer Science and Engineering, Prague 1998
- Crochemore M., Rytter W., Text Algorithms, Oxford University Press, New York 1994 Zbl1078.68151MR1307378
- Crochemore M., Vérin R., Direct construction of compact directed acyclic word graphs, In: CPM97 (A. Apostolico and J. Hein, eds., Lecture Notes in Computer Science 1264), Springer–Verlag, Berlin 1997, pp. 116–129 (1997) MR1608404
- Gonnet G. H., Baeza–Yates R., Handbook of Algorithms and Data Structures, Pascal and C. Addison–Wesley, Wokingham 1991 Zbl0719.68001
- Huffman D. A., A method for construction of minimum redundancy codes, Proc. IRE 40 (1952), 9, 1098–1101 (1952)
- Irving R. W., Suffix Binary Search Trees, Technical Report TR-1995-7, Computing Science Department, University of Glasgow 199
- Kärkkäinen J., Suffix cactus: A cross between suffix tree and suffix array, In: Proc. 6th Symposium on Combinatorial Pattern Matching, CPM95, 1995, pp. 191–204 (1995) MR1467515
- Kurtz S., 10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O, Software–Practice and Experience 29 (1999), 13, 1149-1171 (1999) DOI10.1002/(SICI)1097-024X(199911)29:13<1149::AID-SPE274>3.0.CO;2-O
- Melichar B., Approximate string matching by finite automata, In: Computer Analysis of Images and Patterns (Lecture Notes in Computer Science 970), Springer–Verlag, Berlin 1995
- Melichar B., Fulltext Systems, Publishing House CTU, Prague 1996 (in Czech) (1996)
- Melichar B., Pattern matching and finite automata, In: Proceedings of the Prague Stringology Club Workshop ’97, Prague 1997
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.