Displaying similar documents to “OCR Rate Computation in Mass Digitization Programs”

Noun Sense Disambiguation using Co-Occurrence Relation in Machine Translation

Choe, Changil, Kim, Hyonil (2012)

Serdica Journal of Computing

Similarity:

Word Sense Disambiguation, the process of identifying the meaning of a word in a sentence when the word has multiple meanings, is a critical problem of machine translation. It is generally very difficult to select the correct meaning of a word in a sentence, especially when the syntactical difference between the source and target language is big, e.g., English-Korean machine translation. To achieve a high level of accuracy of noun sense selection in machine translation, we introduced...

Hausdorff Distances for Searching in Binary Text Images

Andreev, Andrey, Kirov, Nikolay (2009)

Serdica Journal of Computing

Similarity:

This work has been partially supported by Grant No. DO 02-275, 16.12.2008, Bulgarian NSF, Ministry of Education and Science. Hausdorff distance (HD) seems the most efficient instrument for measuring how far two compact non-empty subsets of a metric space are from each other. This paper considers the possibilities provided by HD and some of its modifications used recently by many authors for resemblance between binary text images. Summarizing part of the existing word image matching...

Correcting spelling errors by modelling their causes

Sebastian Deorowicz, Marcin Ciura (2005)

International Journal of Applied Mathematics and Computer Science

Similarity:

This paper accounts for a new technique of correcting isolated words in typed texts. A language-dependent set of string substitutions reflects the surface form of errors that result from vocabulary incompetence, misspellings, or mistypings. Candidate corrections are formed by applying the substitutions to text words absent from the computer lexicon. A minimal acyclic deterministic finite automaton storing the lexicon allows quick rejection of nonsense corrections, while costs associated...

An isolated word recognition system based on a low-complexity parametrization procedure.

Héctor Rulot Segovia, Enrique Vidal Ruiz, Francisco Casacuberta Nolla (1984)

Qüestiió

Similarity:

An Isolated Word Recognition System is presented in this paper which uses a parametrization scheme based on the two-level clipped signal Autocorrelation Function. The system prototype runs on a 64 kby. partition of a general-purpose minicomputer with quite small specific hardware requirements and, for moderate sized dictionaries (≤ 40 words), gives 95-98% recognition rates with response times better than two times real time. The system uses claasical Dynamic Programming word-matching,...