Applying A Normalized Compression Metric To The Measurement Of Dialect Distance
Serdica Journal of Computing (2007)
- Volume: 1, Issue: 1, page 73-86
- ISSN: 1312-6555
Access Full Article
topAbstract
topHow to cite
topSimov, Kiril, and Osenova, Petya. "Applying A Normalized Compression Metric To The Measurement Of Dialect Distance." Serdica Journal of Computing 1.1 (2007): 73-86. <http://eudml.org/doc/11413>.
@article{Simov2007,
abstract = {The paper discusses the application of a similarity metric based
on compression to the measurement of the distance among Bulgarian dia-
lects. The similarity metric is de ned on the basis of the notion of Kolmo-
gorov complexity of a le (or binary string). The application of Kolmogorov
complexity in practice is not possible because its calculation over a le is an
undecidable problem. Thus, the actual similarity metric is based on a real life
compressor which only approximates the Kolmogorov complexity. To use the
metric for distance measurement of Bulgarian dialects we rst represent the
dialectological data in such a way that the metric is applicable. We propose
two such representations which are compared to a baseline distance between
dialects. Then we conclude the paper with an outline of our future work.},
author = {Simov, Kiril, Osenova, Petya},
journal = {Serdica Journal of Computing},
keywords = {Kolmogorov Complexity; Compression Metric; Dialect Distance; Language Contacts},
language = {eng},
number = {1},
pages = {73-86},
publisher = {Institute of Mathematics and Informatics Bulgarian Academy of Sciences},
title = {Applying A Normalized Compression Metric To The Measurement Of Dialect Distance},
url = {http://eudml.org/doc/11413},
volume = {1},
year = {2007},
}
TY - JOUR
AU - Simov, Kiril
AU - Osenova, Petya
TI - Applying A Normalized Compression Metric To The Measurement Of Dialect Distance
JO - Serdica Journal of Computing
PY - 2007
PB - Institute of Mathematics and Informatics Bulgarian Academy of Sciences
VL - 1
IS - 1
SP - 73
EP - 86
AB - The paper discusses the application of a similarity metric based
on compression to the measurement of the distance among Bulgarian dia-
lects. The similarity metric is de ned on the basis of the notion of Kolmo-
gorov complexity of a le (or binary string). The application of Kolmogorov
complexity in practice is not possible because its calculation over a le is an
undecidable problem. Thus, the actual similarity metric is based on a real life
compressor which only approximates the Kolmogorov complexity. To use the
metric for distance measurement of Bulgarian dialects we rst represent the
dialectological data in such a way that the metric is applicable. We propose
two such representations which are compared to a baseline distance between
dialects. Then we conclude the paper with an outline of our future work.
LA - eng
KW - Kolmogorov Complexity; Compression Metric; Dialect Distance; Language Contacts
UR - http://eudml.org/doc/11413
ER -
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.