# Applying A Normalized Compression Metric To The Measurement Of Dialect Distance

Serdica Journal of Computing (2007)

- Volume: 1, Issue: 1, page 73-86
- ISSN: 1312-6555

## Access Full Article

top## Abstract

top## How to cite

topSimov, Kiril, and Osenova, Petya. "Applying A Normalized Compression Metric To The Measurement Of Dialect Distance." Serdica Journal of Computing 1.1 (2007): 73-86. <http://eudml.org/doc/11413>.

@article{Simov2007,

abstract = {The paper discusses the application of a similarity metric based
on compression to the measurement of the distance among Bulgarian dia-
lects. The similarity metric is de ned on the basis of the notion of Kolmo-
gorov complexity of a le (or binary string). The application of Kolmogorov
complexity in practice is not possible because its calculation over a le is an
undecidable problem. Thus, the actual similarity metric is based on a real life
compressor which only approximates the Kolmogorov complexity. To use the
metric for distance measurement of Bulgarian dialects we rst represent the
dialectological data in such a way that the metric is applicable. We propose
two such representations which are compared to a baseline distance between
dialects. Then we conclude the paper with an outline of our future work.},

author = {Simov, Kiril, Osenova, Petya},

journal = {Serdica Journal of Computing},

keywords = {Kolmogorov Complexity; Compression Metric; Dialect Distance; Language Contacts},

language = {eng},

number = {1},

pages = {73-86},

publisher = {Institute of Mathematics and Informatics Bulgarian Academy of Sciences},

title = {Applying A Normalized Compression Metric To The Measurement Of Dialect Distance},

url = {http://eudml.org/doc/11413},

volume = {1},

year = {2007},

}

TY - JOUR

AU - Simov, Kiril

AU - Osenova, Petya

TI - Applying A Normalized Compression Metric To The Measurement Of Dialect Distance

JO - Serdica Journal of Computing

PY - 2007

PB - Institute of Mathematics and Informatics Bulgarian Academy of Sciences

VL - 1

IS - 1

SP - 73

EP - 86

AB - The paper discusses the application of a similarity metric based
on compression to the measurement of the distance among Bulgarian dia-
lects. The similarity metric is de ned on the basis of the notion of Kolmo-
gorov complexity of a le (or binary string). The application of Kolmogorov
complexity in practice is not possible because its calculation over a le is an
undecidable problem. Thus, the actual similarity metric is based on a real life
compressor which only approximates the Kolmogorov complexity. To use the
metric for distance measurement of Bulgarian dialects we rst represent the
dialectological data in such a way that the metric is applicable. We propose
two such representations which are compared to a baseline distance between
dialects. Then we conclude the paper with an outline of our future work.

LA - eng

KW - Kolmogorov Complexity; Compression Metric; Dialect Distance; Language Contacts

UR - http://eudml.org/doc/11413

ER -

## NotesEmbed ?

topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.