This is an outdated version published on . Read the most recent version.

A corpus-based study of 16th-century Slovene clitics and clitic-like elements

Authors

  • Alenka Jelovšek ZRC SAZU, Fran Ramovš Institute of the Slovenian Language
  • Tomaž Erjavec Jožef Stefan Institute

DOI:

https://doi.org/10.3986/sjsls.12.1.01

Abstract

This paper undertakes a corpus-based linguistic investigation of the spelling variation in 16th century Slovene both from the diachronic and synchronic points of view. The investigation is based on a manually annotated sample (approx. 14,000 word tokens) from Primož Trubar’s Ta pervi deil tiga Noviga teſtamenta, 1557, and Hiſhna poſtilla, 1595, and Jurij Juričič’s Poſtilla, 1578, and it concentrates on clitics and clitic-like elements. Statistical analysis, based on comparison of the spelling conventions of the early modern period to those of contemporary Slovene using normalised forms of the originals, where we observe cases where one orthographic word is nowadays written as two or more words (1–n mapping) or vice-versa (n–1 mapping), shows that the overall percentage of split and joined word tokens is 5.7%, with JPo 1578 having the highest percentage, and TPo 1595 the lowest, less than half of that of JPo 1578. Of these, the vast majority is for cases where a word is now split. The most predominant among the bound words are non-syllable prepositions v ‘in(to)’, k ‘to’, and z ‘with’, followed by negative proclitic ne ‘not’, enclitic particle li ‘whether, if’ and in rare instances conditional particle bi, reflexive particle se, na ‘on’, ob ‘at, by’, pri ‘at, beside’ and za ‘for, behind’ (the absolute numbers of specific clitics partially correlate with the prevalence of bound variants in comparison with the freestanding variants of those clitics, with the most frequent being predominantly bound while the least frequent are predominantly freestanding). Individual instances of two accented words written together can be attributed to German influence (figino_drevo, der Pfeigenbaum ‘fig tree’).

The cases where one modernised word correlates to two original words are, with the exception of superlative adjective/adverb prefix naj-/nar- ‘the most’ that is orthographically bound with its root in about 25% of instances, sporadic or can be identified as errors in the original books. Of interest are also cases when beginnings of words that are homonymous with non- or one syllable prepositions are separated from the remainder of the word with an apostrophe (eg. s’_nameinja ‘signs’, s’_derſhati ‘to endure’, do_bruta ‘goodness’, sa_doſti ‘enough’). The normalisation also enables the identification of the orthographical variants of the most commonly bound clitics, i. e. non-syllable prepositions k, z and v. K and its allomorph /h/ have 5 attested spelling variants, of which one <q_> is limited to hosts starting with a v-. For z with a voiced allomorph /z/ and voiceless allomorph /s/ three variant spellings were discovered that only partially correspond with a voiceless/voiced distinction of the initial sound of the host word, and the cases of merging with the host that begins with s-/z- were identified. Additional positional spellings probably represent other allomorphs: <sh/ſh/s’h> for palatalized /ž/ in front of a palatal ń and <ſa>, >ſo/so> for syllabified /za/, /zo/. The preposition v shows the highest degree of orthographical variation of all analysed words as it has 10 different spellings: general bound <v_> and <u_> and freestanding <v’_>; <uv_>, <uv’_> and <u’_v> in front of a vowel; <u’_> and <va_> attested only in front of a v-, as well as <v_> and <v’> merged with the initial v- of the host.

The analysis of spelling variation in non-syllable prepositions showed that even a relatively limited hand-corrected annotated sample enabled identification of majority of spelling variants identified in previous works, while with the use of noSketch Engine tool further information about their relative frequency and distribution was obtained. As the hand-corrected corpus is expanded such research will yield even more relevant information for the study of the 16th century Slovene literary language that will significantly supplement existing findings (based on traditionally collected examples) with the help of a large amount of statistically relevant data.

Downloads

Download data is not yet available.

References

Ahačič, Kozma. 2014. The History of Linguistic Thought and Language Use in 16th Century Slovenia. Frankfurt am Main [etc.]: Peter Lang.

Ahačič, Kozma. Legan Ravnikar, Andreja. Merše, Majda. Narat, Jožica. Novak, France. 2011. Besedje slovenskega knjižnega jezika 16. stoletja. Ljubljana: Založba ZRC, ZRC SAZU.

Eckart de Castilho, R. Biemann, C. Gurevych, I. Yimam, S.M. 2014. WebAnno: a flexible, web-based annotation tool for CLARIN. Proceedings of the CLARIN Annual Conference (CAC) 2014, Soesterberg, Netherlands.

Erjavec, Tomaž. 2015. Reference corpus of historical Slovene goo300k 1.2. Slovenian language resource repository CLARIN.SI, http://hdl.handle.net/11356/1025, 2015.

Erjavec, Tomaž. Jelovšek, Alenka. 2013. A corpus-based diachronic analysis of Slovene clitics. New Methods in Historical Corpora, 117–26. Tübingen: NarrVerlag.

Juričič, Jurij. 1578. Poſtilla. Ljubljana. Digital edition: Korpus slovenskega knjižnega jezika 16. stoletja, Jurij Juričič, Poſtilla [https://stage.termania.net/korpus16/].

Legan Ravnikar, Andreja. 2017. K problematiki vpliva stičnega jezika – nemščine na semantične spremembe in stilno vrednost najstarejše slovenske knjižne leksike (16. stoletje). Slovenski jezik = Slovene Linguistic Studies 11: 35–53.

Ljubešić, Nikola. Zupan, Katja. Fišer, Darja. Erjavec, Tomaž. 2016. Normalising Slovene data: historical texts vs. user-generated content. Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), 146–155.

Merše, Majda. Jakopin, Franc. Novak, France. 1992. Fonološki sistem knjižnega jezika slovenskih protestantov. Slavistična revija 40/4: 321–340.

Neweklowsky, Gerhard. 1985. Das Werden der slowenischen Schriftsprache. Entstehung von Sprachen und Völkern. Glotto- und ethnogenetische Aspekte europäischer Sprachen, 391–402. Tübingen: Max Niemeyer Verlag.

Novak, France. 2006. Predponi v- in u- v jeziku slovenskih protestantskih piscev 16. stoletja. Stati inu obstati 3–4: 138–159.

Novak, France. 2011. Predlog v v slovenskem knjižnem jeziku 16. stoletja. Globinska moč besede: red. prof. dr. Martini Orožen ob 80-letnici, 126–142. Maribor: Mednarodna založba Oddelka za slovanske jezike in književnosti, Filozofska fakulteta.

Pflughaupt, Laurent. 2008. Letter by Letter: An Alphabetical Miscellany. Trans. Gregory Bruhn. New York: Princeton Architectural Press.

Rigler, Jakob. 1968. Začetki slovenskega knjižnega jezika. Ljubljana: Slovenska akademija znanosti in umetnosti.

Scherrer, Yves. Ljubešić. Nikola. 2016. Automatic normalisation of the Swiss German ArchiMob corpus using character-level machine translation. Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016), 248–255.

Toporišič, Jože. 2008/2009. »/S/ledni Sazhetig ie Teßhak inu nepopelnom«. Slavistična revija (Trubarjeva številka), letn. 56/57, št. 4, 1: 191–198.

Trubar, Primož. 1557. Ta pervi deil tiga Noviga teſtamenta. Tübingen. Digital edition: Korpus slovenskega knjižnega jezika 16. stoletja, Primož Trubar, Ta pervi deil tiga Noviga teſtamenta [https://stage.termania.net/korpus16/].

Trubar, Primož. 1595. Hiſhna poſtilla. Tübingen. Digital edition: Korpus slovenskega knjižnega jezika 16. stoletja, Primož Trubar, Hiſhna poſtilla [https://stage.termania.net/korpus16/].

Downloads

How to Cite

Jelovšek, A., & Erjavec, T. (2019). A corpus-based study of 16th-century Slovene clitics and clitic-like elements. Slovenski Jezik / Slovene Linguistic Studies, 12, 3–19. https://doi.org/10.3986/sjsls.12.1.01

Issue

Section

Articles