Applying A Normalized Compression Metric To The Measurement Of Dialect Distance

Simov, Kiril; Osenova, Petya

Serdica Journal of Computing (2007)

  • Volume: 1, Issue: 1, page 73-86
  • ISSN: 1312-6555

Abstract

top
The paper discusses the application of a similarity metric based on compression to the measurement of the distance among Bulgarian dia- lects. The similarity metric is de ned on the basis of the notion of Kolmo- gorov complexity of a le (or binary string). The application of Kolmogorov complexity in practice is not possible because its calculation over a le is an undecidable problem. Thus, the actual similarity metric is based on a real life compressor which only approximates the Kolmogorov complexity. To use the metric for distance measurement of Bulgarian dialects we rst represent the dialectological data in such a way that the metric is applicable. We propose two such representations which are compared to a baseline distance between dialects. Then we conclude the paper with an outline of our future work.

How to cite

top

Simov, Kiril, and Osenova, Petya. "Applying A Normalized Compression Metric To The Measurement Of Dialect Distance." Serdica Journal of Computing 1.1 (2007): 73-86. <http://eudml.org/doc/11413>.

@article{Simov2007,
abstract = {The paper discusses the application of a similarity metric based on compression to the measurement of the distance among Bulgarian dia- lects. The similarity metric is de ned on the basis of the notion of Kolmo- gorov complexity of a le (or binary string). The application of Kolmogorov complexity in practice is not possible because its calculation over a le is an undecidable problem. Thus, the actual similarity metric is based on a real life compressor which only approximates the Kolmogorov complexity. To use the metric for distance measurement of Bulgarian dialects we rst represent the dialectological data in such a way that the metric is applicable. We propose two such representations which are compared to a baseline distance between dialects. Then we conclude the paper with an outline of our future work.},
author = {Simov, Kiril, Osenova, Petya},
journal = {Serdica Journal of Computing},
keywords = {Kolmogorov Complexity; Compression Metric; Dialect Distance; Language Contacts},
language = {eng},
number = {1},
pages = {73-86},
publisher = {Institute of Mathematics and Informatics Bulgarian Academy of Sciences},
title = {Applying A Normalized Compression Metric To The Measurement Of Dialect Distance},
url = {http://eudml.org/doc/11413},
volume = {1},
year = {2007},
}

TY - JOUR
AU - Simov, Kiril
AU - Osenova, Petya
TI - Applying A Normalized Compression Metric To The Measurement Of Dialect Distance
JO - Serdica Journal of Computing
PY - 2007
PB - Institute of Mathematics and Informatics Bulgarian Academy of Sciences
VL - 1
IS - 1
SP - 73
EP - 86
AB - The paper discusses the application of a similarity metric based on compression to the measurement of the distance among Bulgarian dia- lects. The similarity metric is de ned on the basis of the notion of Kolmo- gorov complexity of a le (or binary string). The application of Kolmogorov complexity in practice is not possible because its calculation over a le is an undecidable problem. Thus, the actual similarity metric is based on a real life compressor which only approximates the Kolmogorov complexity. To use the metric for distance measurement of Bulgarian dialects we rst represent the dialectological data in such a way that the metric is applicable. We propose two such representations which are compared to a baseline distance between dialects. Then we conclude the paper with an outline of our future work.
LA - eng
KW - Kolmogorov Complexity; Compression Metric; Dialect Distance; Language Contacts
UR - http://eudml.org/doc/11413
ER -

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.