A Statistical Analysis of the Pronunciation of the Letter l in the Sloleks Morphological Lexicon of Slovene
DOI:
https://doi.org/10.3986/JZ.31.1.03Keywords:
pronunciation of the letter l, grapheme-to-phoneme conversion, morphological lexicon, Slovenian, statistical analysisAbstract
The ambiguous pronunciation of the letter l before consonant graphemes (e.g., polža, alge, volilca) poses a problem for grapheme-to-phoneme conversion for Slovenian. Despite having been addressed in multiple Slovenian language resources, the problem is still unresolved. Because of a lack of empirical and machine-readable data on the pronunciation of native speakers, a dataset of approximately six thousand lexemes (and their inflected forms) was manually annotated from the Sloleks Morphological Lexicon of Slovene according to their pronunciation (/l/, /u̯/, or both) and a statistical analysis was performed to reveal the most problematic points. The findings will be useful in developing a model to predict the pronunciation of the letter l before a consonant grapheme.
Downloads
References
Bajec idr. 2023 = Marko Bajec – Iztok Lebar Bajec – Tjaša Šoltes – Jernej Cvek – Jaka Čibej – Kaja Gantar – Sara Sever – Simon Krek, Online Notes – A Real-time Speech Recognition and Machine Translation System for Slovene University Lectures, 2023, 7–10, https://is.ijs.si/wp-content/uploads/2023/11/IS2023_Volume-H.pdf.
Cohen 1960 = Jacob Cohen, A coefficient of agreement for nominal scales, Educational and Psychological Measurement 20.1 (1960), 37–46. DOI:10.1177/001316446002000104
Čibej 2023 = Jaka Čibej, Leksikon besednih oblik Sloleks: poročilo projekta Razvoj slovenščine v digitalnem okolju: aktivnost DS1.3, Ljubljana: Univerza v Ljubljani, Center za jezikovne vire in tehnologije, 2023, https://www.cjvt.si/rsdo/wp-content/uploads/sites/18/2023/06/RSDO_Kazalnik_Sloleks_v2.pdf.
Čibej 2024a = Jaka Čibej, Predicting pronunciation types in the Sloleks morphological lexicon of Slovene, v: Odkrivanje znanja in podatkovna skladišča - SiKDD 2024: zbornik 27. mednarodne multikonference Informacijska družba - IS 2024, ur. Dunja Mladenić – Marko Grobelnik, Ljubljana: Institut Jožef Stefan, 2024, 23–26, https://is.ijs.si/wp-content/uploads/2024/11/IS2024_Volume-C.pdf, DOI: 10.70314/is.2024.sikdd.2.
Čibej 2024b = Jaka Čibej, Dataset of Annotated Slovene Words with Pre-Consonant L ILS 1.0, Slovenian language resource repository CLARIN.SI, ISSN 2820-4042, 2024. http://hdl.handle.net/11356/2025.
Čibej idr. 2022 = Jaka Čibej – Kaja Gantar – Kaja Dobrovoljc – Simon Krek – Peter Holozan – Tomaž Erjavec – Miro Romih – Špela Arhar Holdt – Luka Krsnik – Marko Robnik-Šikonja, Morphological lexicon Sloleks 3.0, Slovenian language resource repository CLARIN.SI, ISSN 2820-4042, 2022, http://hdl.handle.net/11356/1745.
Dobrišek idr. 2022 = Simon Dobrišek – Žiga Golob – Jerneja Žganec Gros, Finite-state super transducers for compact language resource representation in edge voice-AI, Systems science & control engineering 10.1 (2022), 636–644, https://www.tandfonline.com/doi/full/10.1080/2164258 3.2022.2089930. DOI: 10.1080/21642583.2022.2089930.
Erbeli – Pižorn 2012 = Florina Erbeli – Karmen Pižorn, Reading ability, reading fluency and orthographic skills: the case of L1 Slovene English as a foreign language students, English. Center for Educational Policy Studies Journal 2.3 (2012), 119–139, https://files.eric.ed.gov/fulltext/EJ1130208.pdf.
Filipič 2016 = Urša Filipič, Upoštevanje učnega okolja pri prilagajnju [sic!] poučevanja angleščine za učence z disleksijo, diplomsko delo, Univerza v Ljubljani, Pedagoška fakulteta, 2016, https://centerslo.si/wp-content/uploads/2019/10/Obdobja-38_Mirtic.pdf.
Kosem 2022 = Iztok Kosem, Trendi – a monitor corpus of Slovene, v: EURALEX 2022, Proceedings of the XX EURALEX International Congress, ur. Annette Klosa, [b. n. m.]: IDS-Verlag, 2022, 230–239, https://euralex.org/wp-content/themes/euralex/proceedings/Euralex%202022/EURALEX2022_Pr_p230-239_Kosem.pdf.
Križaj idr. 2022a = Janez Križaj – Jerneja Žganec Gros – Simon Dobrišek, Validation of speech data for training automatic speech recognition systems, v: EUSIPCO 2022: 30th European Signal Processing Conference (EUSIPCO 2022), EURASIP 2022, 2022, 1165–1169, https://eurasip.org/Proceedings/Eusipco/Eusipco2022/pdfs/0001165.pdf.
Križaj idr. 2022b = Janez Križaj – Simon Dobrišek – Aleš Mihelič – Jerneja Žganec Gros, Uporaba postopkov strojnega učenja pri samodejni slovenski grafemsko-fonemski pretvorbi, v: Jezikovne tehnologije in digitalna humanistika: zbornik konference 2022, ur. Darja Fišer – Tomaž Erjavec, Ljubljana: Inštitut za novejšo zgodovino, 2022, 248–251, https://nl.ijs.si/jtdh22/pdf/JTDH2022_Proceedings.pdf.
Marjou 2021 = Xavier Marjou, GIPFA: Generating IPA Pronunciation from Audio, v: Zbornik konference eLex 2021, 2021, 588–597, https://elex.link/elex2021/wp-content/uploads/2021/08/eLex_2021_38_pp588-597.pdf.
McHugh 2012 = Mary L. McHugh, Interrater reliability: the kappa statistic, Biochemia medica 22.3 (2012), 276–282. https://pmc.ncbi.nlm.nih.gov/articles/PMC3900052/.
Mirtič 2019 = Tanja Mirtič, Glasoslovne raziskave pri pripravi splošnega razlagalnega slovarja, v: Slovenski javni govor in jezikovno-kulturna (samo)zavest, ur. Hotimir Tivadar, Ljubljana: Znanstvena založba Filozofske fakultete, 2019 (Obdobja 38), 81–90.
Pogačnik 2012 = Aleš Pogačnik, Glasovno domačenje lastnih imen iz nelatiničnih pisav, v: Pravopisna stikanja. Razprave o pravopisnih vprašanjih, ur. Nataša Jakop – Helena Dobrovoljc, Ljubljana: Inštitut za slovenski jezik Frana Ramovša ZRC SAZU, 2012, 73–83. https://unglueit-files.s3.amazonaws.com/ebf/f62400a502b74ca89fdf3e6a9fa3282b.pdf.
Pravopis 8.0 = Pravopis 8.0: Pravila novega slovenskega pravopisa za javno razpravo, https://pravopis8.fran.si/.
Reichel idr. 2008 = Uwe Reichel – Hartmut R. Pfitzinger – Horst-Udo Hain, English grapheme-to-phoneme conversion and evaluation, v: Speech and Language Technology 11 (2008), ur. Grazyna Demenko – Krzysztof Jassem – Stanislaw Szpakowicz, 159–166, https://www.phonetik.uni-muenchen.de/~reichelu/publications/ReichelPfitzingerHainSASR2008.pdf.
Rigler 1980 = Jakob Rigler, O izgovoru črke l v SSKJ, Slavistična revija 28.1 (1980), 114–120, https://srl.si/ojs/srl/article/view/1980-1-0-13.
Schüppert idr. 2017 = Anja Schüppert – Wilbert Heeringa – Jelena Golubovic – Charlotte Gooskens, Write as you speak? A cross-linguistic investigation of orthographic transparency in 16 Germanic, Romance and Slavic languages, English. From semantics to dialectometry 32 (2017), 303–313.
Sever 2023 = Sara Sever, Korpusnojezikoslovna analiza variantnosti obrazil -vec in -lec ter -vka in -lka v sodobni slovenščini, magistrsko delo, Univerza v Ljubljani, Filozofska fakulteta 2023, https://repozitorij.uni-lj.si/Dokument.php?id=176406&lang=slv.
SP 2001 = Slovenski pravopis, Ljubljana: Založba ZRC, ZRC SAZU, 2001.
SSKJ = Slovar slovenskega knjižnega jezika I–V, Ljubljana: Državna založba Slovenije, 1970–1991, https://www.fran.si/130/sskj-slovar-slovenskega-knjiznega-jezika, spletna objava 2014.
Šef 2006 = Tomaž Šef, Avtomatsko naglaševanje nepoznanih besed pri sintezi slovenskega govora, Elektrotehniški vestnik 73.2–3 (2006), 99–104.
Šoltes idr. 2024 = Tjaša Šoltes – Jan Vasiljević – Marko Bajec, ONLINE NOTES: sistem za razpoznavo govora in strojno prevajanje v realnem času na ravni univerzitetnih predavanj, Uporabna informatika 1 (XXXII) (2024), 32–41.
Tivadar 2004 = Hotimir Tivadar, Priprava, izvedba in pomen perceptivnih testov za fonetično-fonološke raziskave (na primeru analize fonoloških parov), Jezik in slovstvo 49.2 (2004), 17–36.
Unuk 2009 = Drago Unuk, Pravopisna načela v slovenskem pravopisu, Revija za elementarno izobraževanje 2.4 (2009), 27–36.
Verdonik idr. 2023 = Darinka Verdonik – Ana Zwitter Vitez – Jana Zemljarič Miklavčič – Simon Krek – Marko Stabej – Tomaž Erjavec – Tomaž Potočnik – Mirjam Sepesy Maučec – Simona Majhenič – Andrej Žgank – Andreja Bizjak – Lucija Gril – Simon Dobrišek – Janez Križaj – Marko Bajec – Iztok Lebar Bajec – Tjaša Jelovšek – Mitja Trojar – Mitja Bernjak – Naum Dretnik – Gregor Strle – Kaja Dobrovoljc – Nikola Ljubešić – Peter Rupnik, Spoken corpus Gos 2.1 (transcriptions), Slovenian language resource repository CLARIN.SI, ISSN 2820-4042, 2024, http://hdl.handle.net/11356/1863.
Zorman 2007 = Anja Zorman, Prepoznavanje glasov in spoznavanje njihovih pisnih ustreznic v maternem in drugem oziroma tujem jeziku, doktorska disertacija, Univerza v Ljubljani, Pedagoška fakulteta, 2007.
Žganec Gros idr. 2020 = Jerneja Žganec Gros – Miro Romih – Tomaž Šef, eBralec 4: hibridni sintetizator slovenskega govora, v: Interakcija človek-računalnik v informacijski družbi: zbornik 23. mednarodne multikonference Informacijska družba - IS 2020, ur. Veljko Pejović idr., Ljubljana: Institut Jožef Stefan, 2020, 17–20, http://library.ijs.si/Stacks/Proceedings/InformationSociety/2020/IS2020_Volume_H%20-%20HCI.pdf.
Žganec Gros idr. 2022 = Jerneja Žganec Gros – Tanja Mirtič – Miroslav Romih – Kozma Ahačič, Slovar izgovarjav OptiLEX, Ljubljana: Založba ZRC, 2022.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors guarantee that the work is their own original creation and does not infringe any statutory or common-law copyright or any proprietary right of any third party. In case of claims by third parties, authors commit their self to defend the interests of the publisher, and shall cover any potential costs.
More in: Submission chapter
