A new practical linear space algorithm for the longest common subsequence problem

Heiko Goeman; Michael Clausen

Kybernetika (2002)

  • Volume: 38, Issue: 1, page [45]-66
  • ISSN: 0023-5954

Abstract

top
This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths m and n , n m , on an alphabet of size s , we first present an algorithm which determines the length p of an LCS in O ( n s + min { m p , p ( n - p ) } ) time and O ( n s ) space. This result has been achieved before [ric94,ric95], but our algorithm is significantly faster than previous methods. We also provide a second algorithm which generates an LCS in O ( n s + min { m p , m log m + p ( n - p ) } ) time while preserving the linear space bound, thus solving the problem posed in [ric94,ric95]. Experimental results confirm the efficiency of our method.

How to cite

top

Goeman, Heiko, and Clausen, Michael. "A new practical linear space algorithm for the longest common subsequence problem." Kybernetika 38.1 (2002): [45]-66. <http://eudml.org/doc/33566>.

@article{Goeman2002,
abstract = {This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths $m$ and $n$, $n\ge m$, on an alphabet of size $s$, we first present an algorithm which determines the length $p$ of an LCS in $O(ns+\min \lbrace mp,p(n-p)\rbrace )$ time and $O(ns)$ space. This result has been achieved before [ric94,ric95], but our algorithm is significantly faster than previous methods. We also provide a second algorithm which generates an LCS in $O(ns+\min \lbrace mp,m\log m+p(n-p)\rbrace )$ time while preserving the linear space bound, thus solving the problem posed in [ric94,ric95]. Experimental results confirm the efficiency of our method.},
author = {Goeman, Heiko, Clausen, Michael},
journal = {Kybernetika},
keywords = {longest common subsequence problem; Rick’s algorithm; longest common subsequence problem; Rick's algorithm},
language = {eng},
number = {1},
pages = {[45]-66},
publisher = {Institute of Information Theory and Automation AS CR},
title = {A new practical linear space algorithm for the longest common subsequence problem},
url = {http://eudml.org/doc/33566},
volume = {38},
year = {2002},
}

TY - JOUR
AU - Goeman, Heiko
AU - Clausen, Michael
TI - A new practical linear space algorithm for the longest common subsequence problem
JO - Kybernetika
PY - 2002
PB - Institute of Information Theory and Automation AS CR
VL - 38
IS - 1
SP - [45]
EP - 66
AB - This paper deals with a new practical method for solving the longest common subsequence (LCS) problem. Given two strings of lengths $m$ and $n$, $n\ge m$, on an alphabet of size $s$, we first present an algorithm which determines the length $p$ of an LCS in $O(ns+\min \lbrace mp,p(n-p)\rbrace )$ time and $O(ns)$ space. This result has been achieved before [ric94,ric95], but our algorithm is significantly faster than previous methods. We also provide a second algorithm which generates an LCS in $O(ns+\min \lbrace mp,m\log m+p(n-p)\rbrace )$ time while preserving the linear space bound, thus solving the problem posed in [ric94,ric95]. Experimental results confirm the efficiency of our method.
LA - eng
KW - longest common subsequence problem; Rick’s algorithm; longest common subsequence problem; Rick's algorithm
UR - http://eudml.org/doc/33566
ER -

References

top
  1. Aho A. V., Hirschberg D. S., Ullman J. D., 10.1145/321921.321922, J. Assoc. Comput. Mach. 23 (1976), 1, 1–12 (1976) Zbl0316.68027MR0398159DOI10.1145/321921.321922
  2. Apostolico A., 10.1016/0020-0190(86)90044-X, Inform. Process. Lett. 23 (1986), 63–69 (1986) Zbl0608.68057MR0858225DOI10.1016/0020-0190(86)90044-X
  3. Apostolico A., 10.1016/0020-0190(87)90167-0, Inform. Process. Lett. 25 (1987), 235–236 (1987) MR0896141DOI10.1016/0020-0190(87)90167-0
  4. Apostolico A., Guerra G., 10.1007/BF01840365, Algorithmica 2 (1987), 315–336 (1987) Zbl0636.68083MR0911954DOI10.1007/BF01840365
  5. Apostolico A., Browne, S., Guerra C., 10.1016/0304-3975(92)90132-Y, Theoret. Comput. Sci. 92 (1992), 3–17 (1992) Zbl0747.68019MR1143128DOI10.1016/0304-3975(92)90132-Y
  6. Chin F. Y. L., Poon C. K., A fast algorithm for computing longest common subsequences of small alphabet size, J. Inform. Process. 13 (1990), 4, 463–469 (1990) Zbl0768.68020
  7. Chvátal V., Sankoff D., 10.2307/3212444, J. Appl. Probab. 12 (1975), 306–315 (1975) MR0405531DOI10.2307/3212444
  8. Dančík V., Paterson M., Upper bounds for the expected length of a longest common subsequence of two binary sequences, In: Proceedings 11th Annual Symp. on Theoretical Aspects of Computer Science (Lecture Notes in Computer Science 775), Springer-Verlag, Berlin 1994, pp. 669–678 (1994) Zbl0941.68812MR1288571
  9. Dayhoff M. O., 10.1016/0022-5193(65)90096-2, J. Theoret. Biol. 8 (1965), 97–112 (1965) DOI10.1016/0022-5193(65)90096-2
  10. Dayhoff M. O., 10.1038/scientificamerican0769-86, Sci. Amer. 221 (1969), 1, 86–95 (1969) DOI10.1038/scientificamerican0769-86
  11. Deken J. G., 10.1016/0012-365X(79)90057-8, Discrete Math. 26 (1979), 17–31 (1979) Zbl0403.68064MR0535080DOI10.1016/0012-365X(79)90057-8
  12. Dilworth R. P., 10.2307/1969503, Ann. of Math. 51 (1950), 161–166 (1950) Zbl0038.02003MR0032578DOI10.2307/1969503
  13. Fu K. S., Bhargava B. K., Tree systems for syntactic pattern recognition, IEEE Trans. Comput. C–22 (1973), 12, 1087–1099 (1973) Zbl0269.68059MR0359424
  14. Gallant J., Maier, D., Storer J. A., 10.1016/0022-0000(80)90004-5, J. Comput. System Sci. 20 (1980), 50–58 (1980) Zbl0436.68043MR0566641DOI10.1016/0022-0000(80)90004-5
  15. Goeman H., Time and Space Efficient Algorithms for Decomposing Certain Partially Ordered Sets, PhD Thesis, Department of Computer Science, University of Bonn 1999. To appear in Bayreuther Mathematische Schriften (1999) MR1942444
  16. Hirschberg D. S., 10.1145/360825.360861, Comm. ACM 18 (1975), 6, 341–343 (1975) Zbl0301.68042MR0375829DOI10.1145/360825.360861
  17. Hirschberg D. S., 10.1145/322033.322044, J. Assoc. Comput. Mach. 24 (1977), 4, 664–675 (1977) Zbl0402.68041MR0461985DOI10.1145/322033.322044
  18. Hsu W. J., Du M. W., 10.1016/0022-0000(84)90025-4, J. Comput. System Sci. 29 (1984), 133–152 (1984) Zbl0587.68045MR0773417DOI10.1016/0022-0000(84)90025-4
  19. Hunt J. W., Szymanski T. G., 10.1145/359581.359603, Comm. ACM 20 (1977), 5, 350–353 (1977) Zbl0354.68078MR0436655DOI10.1145/359581.359603
  20. Kumar S. K., Rangan C. P., 10.1007/BF00265993, Acta Inform. 24 (1987), 353–363 (1987) Zbl0642.68066MR0894559DOI10.1007/BF00265993
  21. Lowrance R., Wagner R. A., 10.1145/321879.321880, J. Assoc. Comput. Mach. 22, (1975), 2, 177–183 (1975) Zbl0301.68028MR0366088DOI10.1145/321879.321880
  22. Lu S. Y., Fu K. S., 10.1109/TSMC.1978.4309979, IEEE Trans. Systems Man Cybernet. SMC–8, (1978), 5, 381–389 (1978) Zbl0378.68048MR0471478DOI10.1109/TSMC.1978.4309979
  23. Maier D., 10.1145/322063.322075, J. Assoc. Comput. Mach. 25 (1978), 2, 322–336 (1978) MR0483700DOI10.1145/322063.322075
  24. Masek W. J., Paterson M. S., 10.1016/0022-0000(80)90002-1, J. Comput. System Sci. 20 (1980), 1, 18–31 (1980) MR0566639DOI10.1016/0022-0000(80)90002-1
  25. Myers E. W., 10.1007/BF01840446, Algorithmica 1 (1986), 251–266 (1986) Zbl0639.68054MR1554305DOI10.1007/BF01840446
  26. Nakatsu N., Kambayashi, Y., Yajima S., 10.1007/BF00264437, Acta Inform. 18 (1982), 171–179 (1982) Zbl0493.68041MR0687701DOI10.1007/BF00264437
  27. Needleman S. B., Wunsch C. S., 10.1016/0022-2836(70)90057-4, J. Molecular Biol. 48 (1970), 443–453 (1970) DOI10.1016/0022-2836(70)90057-4
  28. Paterson M., Dančík V., Longest common subsequences, In: Proceedings, 19th Intern. Symp. on Mathematical Foundations of Computer Science (Lecture Notes in Computer Science 841), Springer Verlag, Berlin 1994, pp. 127–142 (19th) MR1319827
  29. Rick C., New Algorithms for the Longest Common Subsequence Problem, Research Report No. 85123–CS, Department of Computer Science, University of Bonn 1994 
  30. Rick C., A new flexible algorithm for the longest common subsequence problem, In: Proceedings, 6th Annual Symp. on Combinatorial Pattern Matching (Lecture Notes in Computer Science 937), Springer Verlag, Berlin 1995, pp. 340–351 (1995) Zbl0841.68051MR1467525
  31. Sankoff D., Cedergren R. J., 10.1016/0022-2836(73)90369-0, J. Molecular Biol. 77 (1973), 159–164 (1973) DOI10.1016/0022-2836(73)90369-0
  32. Sankoff D., Kruskal J. B., Time Warps, String Edits, and Macromolecules: The Theory And Practice of Sequence Comparison, Addison–Wesley, Reading, MA 1983 MR0726027
  33. Ukkonen E., 10.1016/S0019-9958(85)80046-2, Inform. and Control 64 (1985), 100–118 (1985) Zbl0575.68090MR0837093DOI10.1016/S0019-9958(85)80046-2
  34. Wagner R. A., On the complexity of the extended string–to–string correction problem, In: Proceedings, 7th Ann. ACM Sympos. on Theory of Comput. 1975, pp. 218–223 (1975) Zbl0357.68047MR0445910
  35. Wagner R. A., Fischer M. J., 10.1145/321796.321811, J. Assoc. Comput. Mach. 21 (1974), 1, 168–173 (1974) Zbl0278.68032MR0356576DOI10.1145/321796.321811
  36. Wong C. K., Chandra A. K., 10.1145/321921.321923, J. Assoc. Comput. Mach. 28 (1976), 1, 13–18 (1976) Zbl0316.68019MR0398173DOI10.1145/321921.321923
  37. Wu S., Manber U., Myers, G., Miller W., 10.1016/0020-0190(90)90035-V, Inform. Process. Lett. 35 (1990), 317–323 (1990) Zbl0698.68055MR1079428DOI10.1016/0020-0190(90)90035-V

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.