Automatic Identification of False Friends in Parallel Corpora: Statistical and Semantic Approach
Serdica Journal of Computing (2009)
- Volume: 3, Issue: 2, page 133-158
- ISSN: 1312-6555
Access Full Article
topAbstract
topHow to cite
topNakov, Svetlin. "Automatic Identification of False Friends in Parallel Corpora: Statistical and Semantic Approach." Serdica Journal of Computing 3.2 (2009): 133-158. <http://eudml.org/doc/11445>.
@article{Nakov2009,
abstract = {False friends are pairs of words in two languages that are perceived as
similar but have different meanings. We present an improved
algorithm for acquiring false friends from sentence-level aligned parallel corpus
based on statistical observations of words occurrences and co-occurrences
in the parallel sentences. The results are compared with an entirely semantic
measure for cross-lingual similarity between words based on using the Web
as a corpus through analyzing the words’ local contexts extracted from the
text snippets returned by searching in Google. The statistical and semantic
measures are further combined into an improved algorithm for identification
of false friends that achieves almost twice better results than previously
known algorithms. The evaluation is performed for identifying cognates
between Bulgarian and Russian but the proposed methods could be adopted
for other language pairs for which parallel corpora and bilingual glossaries
are available.},
author = {Nakov, Svetlin},
journal = {Serdica Journal of Computing},
keywords = {Cognates; False Friends; Identification of False Friends; Parallel Corpus; Cross-Lingual Semantic Similarity; Web as a Corpus; cognates; cross-lingual semantic similarity; Web as a corpus; parallel corpus; false friends},
language = {eng},
number = {2},
pages = {133-158},
publisher = {Institute of Mathematics and Informatics Bulgarian Academy of Sciences},
title = {Automatic Identification of False Friends in Parallel Corpora: Statistical and Semantic Approach},
url = {http://eudml.org/doc/11445},
volume = {3},
year = {2009},
}
TY - JOUR
AU - Nakov, Svetlin
TI - Automatic Identification of False Friends in Parallel Corpora: Statistical and Semantic Approach
JO - Serdica Journal of Computing
PY - 2009
PB - Institute of Mathematics and Informatics Bulgarian Academy of Sciences
VL - 3
IS - 2
SP - 133
EP - 158
AB - False friends are pairs of words in two languages that are perceived as
similar but have different meanings. We present an improved
algorithm for acquiring false friends from sentence-level aligned parallel corpus
based on statistical observations of words occurrences and co-occurrences
in the parallel sentences. The results are compared with an entirely semantic
measure for cross-lingual similarity between words based on using the Web
as a corpus through analyzing the words’ local contexts extracted from the
text snippets returned by searching in Google. The statistical and semantic
measures are further combined into an improved algorithm for identification
of false friends that achieves almost twice better results than previously
known algorithms. The evaluation is performed for identifying cognates
between Bulgarian and Russian but the proposed methods could be adopted
for other language pairs for which parallel corpora and bilingual glossaries
are available.
LA - eng
KW - Cognates; False Friends; Identification of False Friends; Parallel Corpus; Cross-Lingual Semantic Similarity; Web as a Corpus; cognates; cross-lingual semantic similarity; Web as a corpus; parallel corpus; false friends
UR - http://eudml.org/doc/11445
ER -
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.