Efficient validation and construction of border arrays and validation of string matching automata

Jean-Pierre Duval; Thierry Lecroq; Arnaud Lefebvre

RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications (2009)

  • Volume: 43, Issue: 2, page 281-297
  • ISSN: 0988-3754

Abstract

top
We present an on-line linear time and space algorithm to check if an integer array f is the border array of at least one string w built on a bounded or unbounded size alphabet Σ . First of all, we show a bijection between the border array of a string w and the skeleton of the DFA recognizing Σ * w , called a string matching automaton (SMA). Different strings can have the same border array but the originality of the presented method is that the correspondence between a border array and a skeleton of SMA is independent from the underlying strings. This enables to design algorithms for validating and generating border arrays that outperform existing ones. The validating algorithm lowers the delay (maximal number of comparisons on one element of the array) from O ( | w | ) to 1 + min { | Σ | , 1 + log 2 | w | } compared to existing algorithms. We then give results on the numbers of distinct border arrays depending on the alphabet size. We also present an algorithm that checks if a given directed unlabeled graph G is the skeleton of a SMA on an alphabet of size s in linear time. Along the process the algorithm can build one string w for which G is the SMA skeleton.

How to cite

top

Duval, Jean-Pierre, Lecroq, Thierry, and Lefebvre, Arnaud. "Efficient validation and construction of border arrays and validation of string matching automata." RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications 43.2 (2009): 281-297. <http://eudml.org/doc/244841>.

@article{Duval2009,
abstract = {We present an on-line linear time and space algorithm to check if an integer array $f$ is the border array of at least one string $w$ built on a bounded or unbounded size alphabet $\Sigma $. First of all, we show a bijection between the border array of a string $w$ and the skeleton of the DFA recognizing $\Sigma ^*w$, called a string matching automaton (SMA). Different strings can have the same border array but the originality of the presented method is that the correspondence between a border array and a skeleton of SMA is independent from the underlying strings. This enables to design algorithms for validating and generating border arrays that outperform existing ones. The validating algorithm lowers the delay (maximal number of comparisons on one element of the array) from $O(|w|)$ to $1+\min \lbrace |\Sigma |,1+\log _2 |w|\rbrace $ compared to existing algorithms. We then give results on the numbers of distinct border arrays depending on the alphabet size. We also present an algorithm that checks if a given directed unlabeled graph $G$ is the skeleton of a SMA on an alphabet of size $s$ in linear time. Along the process the algorithm can build one string $w$ for which $G$ is the SMA skeleton.},
author = {Duval, Jean-Pierre, Lecroq, Thierry, Lefebvre, Arnaud},
journal = {RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications},
keywords = {combinatorics on words; period; border; string matching; string matching automata},
language = {eng},
number = {2},
pages = {281-297},
publisher = {EDP-Sciences},
title = {Efficient validation and construction of border arrays and validation of string matching automata},
url = {http://eudml.org/doc/244841},
volume = {43},
year = {2009},
}

TY - JOUR
AU - Duval, Jean-Pierre
AU - Lecroq, Thierry
AU - Lefebvre, Arnaud
TI - Efficient validation and construction of border arrays and validation of string matching automata
JO - RAIRO - Theoretical Informatics and Applications - Informatique Théorique et Applications
PY - 2009
PB - EDP-Sciences
VL - 43
IS - 2
SP - 281
EP - 297
AB - We present an on-line linear time and space algorithm to check if an integer array $f$ is the border array of at least one string $w$ built on a bounded or unbounded size alphabet $\Sigma $. First of all, we show a bijection between the border array of a string $w$ and the skeleton of the DFA recognizing $\Sigma ^*w$, called a string matching automaton (SMA). Different strings can have the same border array but the originality of the presented method is that the correspondence between a border array and a skeleton of SMA is independent from the underlying strings. This enables to design algorithms for validating and generating border arrays that outperform existing ones. The validating algorithm lowers the delay (maximal number of comparisons on one element of the array) from $O(|w|)$ to $1+\min \lbrace |\Sigma |,1+\log _2 |w|\rbrace $ compared to existing algorithms. We then give results on the numbers of distinct border arrays depending on the alphabet size. We also present an algorithm that checks if a given directed unlabeled graph $G$ is the skeleton of a SMA on an alphabet of size $s$ in linear time. Along the process the algorithm can build one string $w$ for which $G$ is the SMA skeleton.
LA - eng
KW - combinatorics on words; period; border; string matching; string matching automata
UR - http://eudml.org/doc/244841
ER -

References

top
  1. [1] A.V. Aho, J.E. Hopcroft and J.D. Ullman, The design and analysis of computer algorithms. Addison-Wesley (1974). Zbl0326.68005MR413592
  2. [2] M. Crochemore, C. Hancart and T. Lecroq, Algorithms on Strings. Cambridge University Press (2007). Zbl1137.68060MR2355493
  3. [3] J.-P. Duval, T. Lecroq and A. Lefebvre, Border array on bounded alphabet. J. Autom. Lang. Comb. 10 (2005) 51–60. Zbl1089.68080MR2192584
  4. [4] F. Franěk, S. Gao, W. Lu, P.J. Ryan, W.F. Smyth, Y. Sun and L. Yang, Verifying a border array in linear time. J. Combin. Math. Combin. Comput. 42 (2002) 223–236. Zbl1009.68106MR1929012
  5. [5] C. Hancart, Analyse exacte et en moyenne d’algorithmes de recherche d’un motif dans un texte. Ph.D. thesis. Université Paris 7, France (1993). 
  6. [6] D.E. Knuth, J.H. Morris and V.R. Pratt Jr, Fast pattern matching in strings. SIAM J. Comput. 6 (1977) 323–350. Zbl0372.68005MR451916
  7. [7] D. Moore, W.F. Smyth and D. Miller, Counting distinct strings. Algorithmica 23 (1999) 1–13. Zbl0913.68088MR1661152
  8. [8] J.H. Morris and V.R. Pratt Jr, A linear pattern-matching algorithm. Technical Report 40, University of California, Berkeley (1970). 
  9. [9] M. Naylor, Abacaba-dabacaba. http://www.ac.wwu.edu/~mnaylor/abacaba/abacaba.html. 
  10. [10] I. Simon, String matching algorithms and automata, in Proceedings of the First South American Workshop on String Processing, edited by R. Baeza-Yates and N. Ziviani, Belo Horizonte, Brazil (1993) 151–157 MR1286978
  11. [11] W.F. Smyth, Computing Pattern in Strings. Addison Wesley Pearson (2003). 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.