# Correcting spelling errors by modelling their causes

Sebastian Deorowicz; Marcin Ciura

International Journal of Applied Mathematics and Computer Science (2005)

- Volume: 15, Issue: 2, page 275-285
- ISSN: 1641-876X

## Access Full Article

top## Abstract

top## How to cite

topDeorowicz, Sebastian, and Ciura, Marcin. "Correcting spelling errors by modelling their causes." International Journal of Applied Mathematics and Computer Science 15.2 (2005): 275-285. <http://eudml.org/doc/207742>.

@article{Deorowicz2005,

abstract = {This paper accounts for a new technique of correcting isolated words in typed texts. A language-dependent set of string substitutions reflects the surface form of errors that result from vocabulary incompetence, misspellings, or mistypings. Candidate corrections are formed by applying the substitutions to text words absent from the computer lexicon. A minimal acyclic deterministic finite automaton storing the lexicon allows quick rejection of nonsense corrections, while costs associated with the substitutions serve to rank the remaining ones. A comparison of the correction lists generated by several spellcheckers for two corpora of English spelling errors shows that our technique suggests the right words more accurately than the others.},

author = {Deorowicz, Sebastian, Ciura, Marcin},

journal = {International Journal of Applied Mathematics and Computer Science},

keywords = {spelling errors; finite state automata; spelling correction},

language = {eng},

number = {2},

pages = {275-285},

title = {Correcting spelling errors by modelling their causes},

url = {http://eudml.org/doc/207742},

volume = {15},

year = {2005},

}

TY - JOUR

AU - Deorowicz, Sebastian

AU - Ciura, Marcin

TI - Correcting spelling errors by modelling their causes

JO - International Journal of Applied Mathematics and Computer Science

PY - 2005

VL - 15

IS - 2

SP - 275

EP - 285

AB - This paper accounts for a new technique of correcting isolated words in typed texts. A language-dependent set of string substitutions reflects the surface form of errors that result from vocabulary incompetence, misspellings, or mistypings. Candidate corrections are formed by applying the substitutions to text words absent from the computer lexicon. A minimal acyclic deterministic finite automaton storing the lexicon allows quick rejection of nonsense corrections, while costs associated with the substitutions serve to rank the remaining ones. A comparison of the correction lists generated by several spellcheckers for two corpora of English spelling errors shows that our technique suggests the right words more accurately than the others.

LA - eng

KW - spelling errors; finite state automata; spelling correction

UR - http://eudml.org/doc/207742

ER -

## References

top- Atkinson K. (2002): Spell checker test kernel results. - Available at http://aspell.net/test
- Atkinson K. (2003): GNU Aspell. - Available at http://aspell.sourceforge.net/
- Baeza-Yates R. and Navarro G. (1998): Fast approximate string matching in a dictionary. - Proc. 5th Int. Symp. String Processing and Information Retrieval, SPIRE'98,Santa Crus de la Sierra, Bolivia, pp. 14-22.
- Berkel B. van and Smedt K.D. (1988): Triphone analysis: A combined method for the correction of orthographical and typographical errors. -Proc. 2nd Conf. Applied Natural Language Processing, Austin, pp. 77-83.
- Brill E. and Moore R.C. (2000): An improved error model for noisy channel spelling correction. -Proc. 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 286-293.
- Carrasco R. and Forcada M. (2002): Incremental construction and maintenance of minimal finite-state automata. - Comput. Linguistics, Vol. 28, No. 2, pp. 207-216. Zbl1232.68080
- Church K.W. and Gale W.A. (1991): Probability scoring for spelling correction. -Statist. Comput., Vol. 1, No. 1, pp. 93-103.
- Ciura M.G. and Deorowicz S. (2001): How to squeeze a lexicon. -Soft. Pract. Exper., Vol. 31, No. 11, pp. 1077-1090. Zbl0987.68782
- Czech Z.J., Havas G. and Majewski B.H. (1997): Perfect hashing. -Theoret. Comput. Sci., Vol. 182, No. 1-2, pp. 1-143.
- Daciuk J., Mihov S., Watson B.W. and Watson R.E. (2000): Incremental construction of minimal acyclic finite-state automata. -Comput. Linguistics, Vol. 26, No. 1, pp. 3-16. Zbl1232.68081
- Damerau F.J. (1964): A technique for computer detection and correction of spelling errors. -Comm. ACM, Vol. 7, No. 3, pp. 171-176.
- Damerau F.J. (1990): Evaluating computer-generated domain-vocabularies. -Inf. Process. Manag., Vol. 26, No. 6, pp. 791-801.
- Damerau F.J. and Mays E. (1989): An examination of undetected typing errors. -Inf. Process. Manag., Vol. 25, No. 6, pp. 659-664.
- Darragh J.J., Cleary J.G. and Witten I.H. (1993): Bonsai: A compact representation of trees. -Softw. Pract. Exper., Vol. 23, No. 3, pp. 277-291.
- Gadd T.N. (1990): PHONIX: The algorithm. -Program: Automat. Library Inf. Syst.,Vol. 24, No. 4, pp. 363-366.
- Grudin J. (1983): Error patterns in skilled and novice transcription typing, In: Cognitive Aspects of Skilled Typewriting, (W.E. Copper, Ed.). -New York: Springer-Verlag, pp. 121-143.
- Hodge V.J. and Austin J. (2003): A comparison of standard spell checking algorithms and a novel binary neural approach. - IEEE Trans. Knowl. Data Eng., Vol. 15, No. 5, pp. 1073-1081.
- Jassem W. (1983): The Phonology of Modern English. -Warsaw: Polish Scientific Publishers.
- Knuth D.E. (1973): The Art of Computer Programming (Vol. 3. Sorting and Searching Algorithms). - Reading, MA: Addison-Wesley. Zbl0302.68010
- Kuenning G.H. (2003): International ispell. - Available at http://www.cs.hmc.edu/~geoff/ispell.html.
- Kukich K. (1988): Variations on a back-propagation name recognition net. - Proc. Advanced Technology Conference, U.S. Postal Service, Wash. D.C., USA, pp. 722-735.
- Kukich K. (1992): Techniques for automatically correcting words in text. - ACM Comput. Surveys, Vol. 24, No. 4, pp. 377-439.
- Levenshtein V.I. (1966): Binary codes capable of correcting deletions, insertions and reversals. - Soviet Physics Doklady, Vol. 10, pp. 707-710.
- Maly K. (1976): Compressed tries. - Comm. ACM, Vol. 19, No. 7, pp. 409-415. Zbl0329.68037
- Mateescu D. (2003): English phonetics and phonological theory. - University of Bucharest, Romania. Available at http://www.unibuc.ro/eBooks/filologie/mateescu
- Merriam-Webster (2002): A dictionary of prefixes, suffixes, and combining forms from Webster's third new international dictionary, unabridged. - Merriam-Webster. Available at http://www.spellingbee.com/pre_suf_comb.pdf
- Mihov S. and Schulz K.U. (2004): Fast approximate search in large dictionaries. - Comput. Linguistics, Vol. 30, No. 4, pp. 451-477. Zbl1234.68424
- Mitton R. (1987): Spelling checkers, spelling correctors, and the misspellings of poor spellers. - Inf. Process. Manag., Vol. 23, No. 5, pp. 495-505.
- Mitton R. (1996): Spellchecking by computer. - J. Simplif. Spell. Soc., Vol. 20. No. 1, pp. 4-11.
- Morrison D.R. (1968): PATRICIA-Practical Algorithm to Retrieve Information Coded in Alphanumeric. - J. ACM, Vol. 15, No. 4, pp. 514-534.
- Odell M.K. and Russell R.C. (1918): U.S. Patent Numbers, 1,261,167 (1918) and 1,435,663 (1922). - U.S. Patent Office, Washington, D.C.
- Oflazer K. (1996): Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. - Comput. Linguistics, Vol. 22, No. 1, pp. 73-89.
- Peterson J.L. (1986): A note on undetected typing errors. - Comm. ACM, Vol. 29, No. 7, pp. 633-637.
- Philips L. (1990): Hanging on the metaphone. - Comput. Lang. Mag., Vol. 7, No. 12, pp. 38-44.
- Philips L. (2000): The double metaphone search algorithm. - CC++ Users Journal, Vol. 18, No. 6.
- Pollock J.J. and Zamora A. (1983): Collection and characterization of spelling errors in scientific and scholary text. - J. Amer. Soc. Inf. Sci., Vol. 34, No. 1, pp. 51-58.
- Pollock J.J. and Zamora A. (1984): Automatic spelling correction in scientific and scholarly text. - Comm. ACM, Vol. 27, No. 4, pp. 358-368.
- Savary A. (2001): Typographical nearest-neighbor search in a finite-state lexicon and its application to spelling correction. - Lect. Notes Comput. Sci., Vol. 2494, pp. 251-260. Zbl1015.68555
- Schulz K.U. and Mihov S. (2002): Fast string correction with Levenshtein-automata. - Int. J. Docum. Anal. Recognit., Vol. 5, No. 1, pp. 67-85. Zbl1039.68137
- Toutanova K. and Moore R.C. (2002): Pronunciation modelling for improved spelling correction. - Proc. 40th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 144-151.
- Wagner R.A. (1974): Order-n correction for regular languages. - Comm. ACM, Vol. 17, No. 5, pp. 265-268. Zbl0276.68011
- Wikipedia (2003): Wikipedia: List of common misspellings. - Available at http://en2.wikipedia.org/wiki/Wikipedia:List_of_common_misspellings
- Yannakoudakis E.J. and Fawthrop D. (1983a): An intelligent spelling corrector. - Inf. Process. Manag., Vol. 19, No. 12, pp. 101-108.
- Yannakoudakis E.J. and Fawthrop D. (1983b): The rules of spelling errors. - Inf. Process. Manag., Vol. 19, No. 2, pp. 87-99.

## NotesEmbed ?

topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.