Towards New Czechoslovak Hyphenation Patterns

Petr Sojka; Ondřej Sojka

Zpravodaj Československého sdružení uživatelů TeXu (2020)

  • Volume: 030, Issue: 3-4, page 118-126
  • ISSN: 1211-6661

Abstract

top
Space- and time-effective segmentation and hyphenation of natural languages stay at the core of every document preparation system, web browser, or mobile rendering system. Recently, the unreasonable effectiveness of pattern generation has been shown - it is possible to use hyphenation patterns to solve the dictionary problem for a single language without compromise. In this article, we will show how we applied the marvelous effectiveness of patgen for the generation of the new Czechoslovak hyphenation patterns that cover two languages. We show that the development of more universal hyphenation patterns is feasible, allows for significant quality improvements and space savings. We evaluate the new approach and the new Czechoslovak hyphenation patterns.

How to cite

top

Sojka, Petr, and Sojka, Ondřej. "Towards New Czechoslovak Hyphenation Patterns." Zpravodaj Československého sdružení uživatelů TeXu 030.3-4 (2020): 118-126. <http://eudml.org/doc/298620>.

@article{Sojka2020,
abstract = {Space- and time-effective segmentation and hyphenation of natural languages stay at the core of every document preparation system, web browser, or mobile rendering system. Recently, the unreasonable effectiveness of pattern generation has been shown - it is possible to use hyphenation patterns to solve the dictionary problem for a single language without compromise. In this article, we will show how we applied the marvelous effectiveness of patgen for the generation of the new Czechoslovak hyphenation patterns that cover two languages. We show that the development of more universal hyphenation patterns is feasible, allows for significant quality improvements and space savings. We evaluate the new approach and the new Czechoslovak hyphenation patterns.},
author = {Sojka, Petr, Sojka, Ondřej},
journal = {Zpravodaj Československého sdružení uživatelů TeXu},
keywords = {hyphenation; hyphenation patterns; patgen; syllabification; syllabic hyphenation; Czech; Slovak; Czechoslovak patterns; patgen; vzory dělení slov; československé dělení; efektivní segmentace; slabičné dělení pro více jazyků},
language = {eng},
number = {3-4},
pages = {118-126},
publisher = {Československé sdružení uživatelů TeXu},
title = {Towards New Czechoslovak Hyphenation Patterns},
url = {http://eudml.org/doc/298620},
volume = {030},
year = {2020},
}

TY - JOUR
AU - Sojka, Petr
AU - Sojka, Ondřej
TI - Towards New Czechoslovak Hyphenation Patterns
JO - Zpravodaj Československého sdružení uživatelů TeXu
PY - 2020
PB - Československé sdružení uživatelů TeXu
VL - 030
IS - 3-4
SP - 118
EP - 126
AB - Space- and time-effective segmentation and hyphenation of natural languages stay at the core of every document preparation system, web browser, or mobile rendering system. Recently, the unreasonable effectiveness of pattern generation has been shown - it is possible to use hyphenation patterns to solve the dictionary problem for a single language without compromise. In this article, we will show how we applied the marvelous effectiveness of patgen for the generation of the new Czechoslovak hyphenation patterns that cover two languages. We show that the development of more universal hyphenation patterns is feasible, allows for significant quality improvements and space savings. We evaluate the new approach and the new Czechoslovak hyphenation patterns.
LA - eng
KW - hyphenation; hyphenation patterns; patgen; syllabification; syllabic hyphenation; Czech; Slovak; Czechoslovak patterns; patgen; vzory dělení slov; československé dělení; efektivní segmentace; slabičné dělení pro více jazyků
UR - http://eudml.org/doc/298620
ER -

References

top
  1. Keary, Major, On Hyphenation - Anarchy of Pedantry, PC Update, The magazine of the Melbourne PC User Group. 2005. Available also from: https://web.archive.org/web/20050310054738/http://www.melbpc.org.au/pcupdate/9100/9112article4.htm. (2005) 
  2. Marchand, Yannick, Adsett, Connie R., Damper, Robert I., 10.1177/0023830908099881, Language and Speech. 2009, vol. 52, no. 1, pp. 1–27. Available from doi: 10.1177/0023830908099881. (2009) DOI10.1177/0023830908099881
  3. Bartlett, Susan, Kondrak, Grzegorz, Cherry, Colin, Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion, In: Proceedings of ACL-08: HLT. Columbus, Ohio: Association for Computational Linguistics, 2008, pp. 568–576. Available also from: https://www.aclweb.org/anthology/P08-1065. (2008) 
  4. Trogkanis, Nikolaos, Elkan, Charles, Conditional Random Fields for Word Hyphenation, In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden: Association for Computational Linguistics, 2010, pp. 366–374. Available also from: https://www.aclweb.org/anthology/P10-1038. (2010) 
  5. Liang, Franklin M., Word Hy-phen-a-tion by Com-put-er, 1983. PhD thesis. Stanford University. (1983) 
  6. Shao, Yan, Hardmeier, Christian, Nivre, Joakim, 10.1162/tacl_a_00033, Transactions of the Association for Computational Linguistics. 2018, vol. 6, pp. 421–435. Available from DOI: 10.1162/tacl_a_00033. (2018) DOI10.1162/tacl_a_00033
  7. Reutenauer, Arthur, Miklavec, Mojca, TeX hyphenation patterns, TUG, [n.d.]. Available also from: https://tug.org/tex-hyphen/. Accessed 2019-11-24. 
  8. The Oxford Spelling Dictionary, Oxford University Press, 1990. The Oxford Library of English Usage. (1990) 
  9. Webster's Third New International Dictionary of the English Language Unabridged, Springfield, Massachusetts, U.S.A: Merriam-Webster Inc., 2002. (2002) 
  10. The Chicago Manual of Style, 17th ed. Chicago: University of Chicago Press, 2017. isbn 9780226287058. (2017) 
  11. Sojka, Petr, Notes on Compound Word Hyphenation in TeX, TUGboat. 1995, vol. 16, no. 3, 290–297. Available also from: https://tug.org/TUGboat/tb16-3/tb48soj2.pdf. (1995) 
  12. Sojka, Petr, Ševeček, Pavel, Hyphenation in TeX - Quo Vadis?, TUGboat. 1995, vol. 16, no. 3, 280–289. Available also from: https://tug.org/TUGboat/tb16-3/tb48soj1.pdf. (1995) 
  13. Sojka, Petr, Hyphenation on Demand, TUGboat. 1999, vol. 20, no. 3, 241–247. Available also from: https ://tug.org/TUGboat/tb20-3/tb64sojka.pdf. (1999) 
  14. Sojka, Petr, 10.5300/2004-3-4/183, (Slovak Hyphenation Patterns: A Time for Change?) CSTUG Bulletin. 2004, vol. 14, no. 3–4, 183–189. Available from doi: 10.5300/2004-3-4/183. (2004) DOI10.5300/2004-3-4/183
  15. Sojka, Petr, Sojka, Ondřej, The Unreasonable Effectiveness of Pattern Generation, TUGboat. 2019, vol. 40, no. 2, pp. 187–193. Available also from: https://tug.org/TUGboat/tb40-2/tb125sojka-patgen.pdf. (2019) 
  16. Jakubíčekm Milos, Kilgarriff, Adam, Kovář, Vojtěch, Rychlý, Pavel, Suchomel, Vít, The TenTen Corpus Family, In: Proc. of the 125 7th International Corpus Linguistics Conference (CL). Lancaster, 2013, pp. 125–127. (2013) 
  17. Kilgarriff, Adam, Rychlý, Pavel, Smrž, Pavel, Tugwell, David, The Sketch Engine, In: Proceedings of the Eleventh EURALEX International Congress. Lorient, France, 2004, pp. 105–116. (2004) 
  18. Sojka, Petr, Sojka, Ondřej, 10.5300/2019-1-4/73, Zpravodaj CSTUG. 2019, vol. 29, no. 1–4, 73–86. Available from DOI: 10.5300/2019-1-4/73. (2019) DOI10.5300/2019-1-4/73
  19. Chlebíková, Jana, 10.5300/1991-4/10, (How to hyphenate the word Czechoslovakia). Zpravodaj CSTUG. 1991, vol. 1, no. 4, 10–13. Available from DOI: 10.5300/1991-4/10. (1991) DOI10.5300/1991-4/10
  20. Sojka, Petr, Slovenské vzory dělení: čas pro změnu?, In: Proceedings of SLT 2004, 4th seminar on Linux and TEX. Znojmo: Konvoj, 2004, 67–72. Available also from: https://fi.muni.cz/usr/sojka/papers/skhyp.pdf. (2004) 
  21. Sojka, Ondřej, Sojka, Petr, cshyphen repository, [N.d.]. Available also from: https://github.com/tensojka/cshyphen. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.