Recognizing Abbreviations and Their Expansions in English Criminal Justice and Security Texts Using the Krajšavar Algorithm, ChatGPT, and Perplexity
DOI:
https://doi.org/10.3986/JZ.31.2.05Keywords:
abbreviations, English, dictionary, algorithm, artificial intelligenceAbstract
This article examines the application of artificial intelligence tools, specifically ChatGPT and Perplexity, for recognizing abbreviations and their corresponding expansions in English professional texts dealing with criminal justice and security. The results are compared with those obtained through text filtering using the Krajšavar algorithm. The English texts used for filtering with both AI tools and the algorithm were manually collected based on a typological classification of English criminal justice and security texts. This article presents the main features of the Krajšavar algorithm, outlines its development and functioning, and describes the text collection and material preparation for filtering. The analysis focuses on using ChatGPT and Perplexity for the automatic recognition of abbreviation–expansion pairs in English criminal justice and security texts. The filtering results are evaluated and subsequently compared with those obtained through the Krajšavar algorithm and manual verification.
Downloads
References
Bezlaj 1995 = France Bezlaj, Etimološki slovar slovenskega jezika 3: P–S, dopolnila in uredila Metka Furlan – Marko Snoj, Ljubljana: Mladinska knjiga, 1995.
Byrd – Park 2011 = Roy J. Byrd – Youngja Park, Hybrid TextMining for Finding Abbreviations and Their Definitions, IMB Thomas J. Watson Research Center (2011), 167–170.
ChatGPT, marec 2025, https://chatgpt.com.
Debenjak – Debenjak – Debenjak 1992 = Doris Debenjak – Božidar Debenjak – Primož Debenjak, Veliki nemško-slovenski slovar = Grosses deutsch-slowenisches Wörterbuch, Ljubljana: Državna založba Slovenije, 1992.
de Schryver 2023 = Gilles-Maurice de Schryver, Generative AI and Lexicography: the Current State of the Art Using ChatGPT, International Journal of Lexicography 36.4 (2023), 355–387, DOI: https://doi.org/10.1093/ijl/ecad021.
Digitalna slovarska baza za slovenščino, 2023–, https://www.cjvt.si/blog/oznaka/digitalna-slovarska-baza-za-slovenscino/.
Gabrovšek 1994 = Dušan Gabrovšek, Kodifikacija angleškega jezika v specializiranih enojezičnih slovarjih: too much of everything?, Vestnik 28.1–2 (1994), 150–180.
Gelernter – Judith 2013 = Judith Gelernter – Shilpa Balaji, An algorithm for local geoparsing of microtext, Geoinformatica 17 (2013), 635–667.
Hajnšek-Holz – Jakopin 1996 = Milena Hajnšek-Holz – Primož Jakopin, Odzadnji slovar slovenskega jezika po Slovarju slovenskega knjižnega jezika, Ljubljana: Založba ZRC, ZRC SAZU, 1996.
Huang idr. 2022 = Xiusheng Huang – Bin Li – Fei Xia – Yixuan Weng, A novel initial reminder framework for acronym extraction, v: SDU@AAAI-22, 2022, https://ceur-ws.org/Vol-3164/paper29.pdf.
Humar 2004 = Marjeta Humar (ur.), Terminologija v času globalizacije: zbornik prispevkov s simpozija Terminologija v času globalizacije, Ljubljana, 5.-6. junij 2003 = Terminology at the time of globalization, Ljubljana: Znanstvenoraziskovalni center SAZU, Založba ZRC = Scientific Research Centre SASA, ZRC Publishing, 2004.
Jakubíček – Rundell 2023 = Miloš Jakubíček – Michael Rundell, The end of lexicography? Can ChatGPT outperform current tools for post-editing lexicography. v: Electronic lexicography in the 21st century: invisible lexicography, Brno: Lexical Computing, 2023, 518–533, https://elex.link/elex2023/wp-content/uploads/102.pdf.
Kolokacijski slovar sodobne slovenščine, 2018–, https://viri.cjvt.si/kolokacije/slv/#
Kompara Lukančič 2009 = Mojca Kompara Lukančič, Prepoznavanje krajšav v besedilih, Jezikoslovni zapiski 15.1–2 (2009), 95–112.
Kompara Lukančič 2010 = Mojca Kompara Lukančič, Krajšavni slovarji, Jezikoslovni zapiski 16.2 (2010), 111–129.
Kompara Lukančič 2011 = Mojca Kompara Lukančič, Razvoj algoritma za samodejno prepoznavanje krajšav in krajšavnih razvezav v elektronskih besedilih, Jezikoslovni zapiski 17.2 (2011), 107–122.
Kompara Lukančič 2018 = Mojca Kompara Lukančič, Sinhrono-diahroni pregled krajšav v slovenskem prostoru in sestava slovarja krajšav, Maribor: Univerza v Mariboru, Univerzitetna založba, 2018.
Kompara Lukančič 2023a = Mojca Kompara Lukančič, Compilation of English entries in the contemporary Slovene dictionary of abbreviations, International Journal of Lexicography 36.2 (2023), 195–210.
Kompara Lukančič 2023b = Mojca Kompara Lukančič, English for specific purposes: selected readings from the field of English for criminal justice and security, Maribor: Univerza v Mariboru, Univerzitetna založba, 2023.
Kompara Lukančič 2025 = Mojca Kompara Lukančič, Prepoznavanje krajšav in razvezav v angleških besedilih s področja varstvoslovja z algoritmom Krajšavar ter orodjema ChatGPT in Perplexity [zaključena zbirka raziskovalnih podatkov], 2025, https://dk.um.si/IzpisGradiva.php?lang=slv&id=95507.
Kompara Lukančič – Holozan 2011 = Mojca Kompara Lukančič – Peter Holozan, What is needed for automatic production of simple and complex dictionary entries in the first Slovene online dictionary of abbreviations using Termania website, v: Electronic lexicography in the 21st century: new applications for new users, ur. Iztok Kosem – Karmen Kosem, Ljubljana: Trojina, 2011, 140–146.
Kompara Lukančič – Smajla 2025 = Mojca Kompara Lukančič – Tilen Smajla, Krajšavar—an algorithm for recognizing English abbreviations in texts related to criminal justice and security, International Journal of Lexicography 38.3 (2025), 237–269, DOI: https://doi.org/10.1093/ijl/ecaf012.
Kosem 2024 = Iztok Kosem, Veliki slovensko-madžarski slovar, različica 2.0, rastoči slovar, Založba ZRC SAZU, 2024–, https://franja.si/slovar/sl-ma.
Kosem – Gantar – Krek 2013 = Iztok Kosem – Polona Gantar – Simon Krek, Avtomatizacija leksikografskih postopkov, Slovenščina 2.0 1.2 (2013), 139–164.
Košmrlj-Levačič – Seliškar 2004 = Borislava Košmrlj-Levačič – Tomaž Seliškar, Uporabniški računalniški program SlovarRed 2.0, v: Terminologija v času globalizacije, ur. Marjeta Humar, Ljubljana: Znanstvenoraziskovalni center SAZU, Založba ZRC, 2004, 179–199.
Kuo idr. 2009 = Cheng-Ju Kuo – Maurice HT Ling – Kuan-Ting Lin – Chun-Nan Hsu, BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature, v: Eight International Conference on Bioinformatics 10 (2009), S7, https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-S15-S7.
Larkey idr. 2000 = Leah S. Larkey – Paul Ogilvie – M. Andrew Price – Brenden Tamilio, Acrophile: an Automated Acronym Extractor and Server in Digital Libraries, v: Proceedings of the Fifth ACM Conference on Digital Libraries, ur. Peter J. Nürnberg – David L. Hicks – Richard Futura, New York: Association for Computing Machinery, 2000, 205–214.
Lew 2023 = Robert Lew, ChatGPT as a COBUILD lexicographer, Humanit Soc Sci Commun 10 (2023), 704, DOI: https://doi.org/10.1057/s41599-023-02119-6.
Liu – Liu – Huang 2017 = Jie Liu – Caihua Liu – Yalou Huang, Multi-granularity sequence labeling model for acronym expansion identification, Information Sciences 378 (2017), 462–474.
Mikolič 2007 = Vesna Mikolič, Tipologija turističnih besedil s poudarkom na turističnooglaševalskih besedilih, Jezik in slovstvo 52.3–4 (2007), 107–116.
Montalvo idr. 2018 = Soto Montalvo – Raquel Martínez – Mario Almagro – Susana Lorenzo, MAMTRA-MED at Biomedical Abbreviation Recognition and Resolution - IberEval 2018, v: CEUR Workshop Proceedings, ur. María Teresa Martín-Valdivia – María Dolores Molina-González – Salud María Jiménez-Zafra, 2018, 1–8, https://ceur-ws.org/Vol-2150/BARR2_paper1.pdf.
Perplexity = Perplexity, marec 2025, https://www.perplexity.ai
Rundell 2023 = Michael Rundell, Automating the creation of dictionaries: are we nearly there, v: Asialex 2023 Proceedings, Lexicography, Artificial Intelligence and Dictionary Users, Seoul: Yonsey University, 2023, 9.
Schwartz – Hearst 2003 = Ariel S. Schwartz – Marti A. Hearst, A simple algorithm for identifying abbreviation definitions in biomedical texts, v: Proceedings of the Pacific Symposium on Biocomputing, ur. Russ B. Altman – A. Keith Dunken – Lawrence Hunter – Tiffany A. Jung – Teri E. Klein, Kauai: Indiana University School of Medicine, 2003, 451–462.
Slovar sopomenk sodobne slovenščine = Slovar sopomenk sodobne slovenščine, 2017–, http://viri.cjvt.si/sopomenke/slv/
Snoj 1998 = Marko Snoj, Slovenski etimološki slovar, Ljubljana: Založba ZRC SAZU, 1998.
SP 2001 = Slovenski pravopis, 2014, www.fran.si.
Šatev – Nikolov 2008 = Vesna Šatev – Nicolas Nikolov, Using the Web as a Corpus for Extracting Abbreviations in the Serbian Language, v: Jezikovne tehnologije: zbornik 11. mednarodne multikonference Informacijska družba – IS, ur. Tomaž Erjavec – Jerneja Žganec Gros, Ljubljana: Institut Jožef Stefan, 2008, 75–79.
Taghva – Gilbreth 1999 = Kazem Taghva – Jeff Gilbreth, Recognizing acronyms and their definitions, International Journal on Document Analysis and Recognition 1.4 (1999), 191–198.
Veyseh idr. 2020 = Amir Pouran Ben Veyseh – Franck Dernoncourt – Thein Huu Nguyen – Walter Chang – Leo Anthony Celi, Acronym identification and disambiguation shared tasks for scientific document understanding. arXiv preprint arXiv:2012.11760, https://ceur-ws.org/Vol-2831/paper33.pdf
Veyseh idr. 2022 = Amir Pouran Ben Veyseh – Nicole Meister – Franck Dernoncourt – Thein Huu Nguyen, Acronym extraction and acronym disambiguation shared tasks at the Scientific Document Understanding Workshop 2022, v: Proceedings of the Scientific Document Understanding Workshop 2022, ur. Amir Pouran Ben Veyseh idr., 2022, https://ceur-ws.org/Vol-3164/.
Vossen 2022 = Piek Vossen, ChatGPT Is a Waste of Time, VU Magazine, 2022, https://vumagazine.nl/professor-piek-vossen-chatgpt-is-a-waste-of-time?lang=en.
Verovnik 2023 = Tina Verovnik, Pomen javne razprave za prenovo pravopisnih pravil, Škrabčevi dnevi 12, ur. Danila Zuljan Kumar – Helena Dobrovoljc, Nova Gorica: Založba Univerze, 2023, 35–43.
Weiss 1998 = Peter Weiss, Slovar govorov Zadrečke doline: med Gornjim Gradom in Nazarjami: poskusni zvezek (A–H), Ljubljana: Založba ZRC SAZU, 1998.
Weiss 1991 = Peter Weiss, Zasnova novega odzadnjega slovarja slovenskega jezika, Jezikoslovni zapiski 1.1 (1991), 121–139.
Wu idr. 2015 = Yonghui Wu – Jun Xu – Yaoyun Zhang – Hua Xu, Clinical abbreviation disambiguation using neural word embeddings, v: Proceedings of BioNLP 15, ur. Kevin Bretonnel Cohen idr., Beijing: Association for Computational Linguistics, 2015, 171–176, DOI: 10.18653/v1/W15-38.
Xu – Huang 2005 = Jun Xu – Ya-Lou Huang, A machine learning approach to recognizing acronyms and their expansion. v: 2005 International Conference on Machine Learning and Cybernetics 4, ur. Daniel S. Yeung – Zhi-Qiang Liu, Guangzhou, China: Springer-Verlag, 2005, 2313–2319.
Yeates 1999 = Stuart Yeates, Automatic extraction of acronyms from text, v: Proceedings of the Third New Zealand Computer Science Research Students, ur. David Bainbridge – Stuart A. Yeast, 1999, https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=31fcd3c1ac62b3612c071adc13c3b041644a2222.
Zahariev 2004 = Manuel Zahariev, A (Acronyms), doktorska disertacija, Simon Fraser University, School of Computing Science, 2004.
Zhou – Torvik – Smalheiser 2006 = Wei Zhou – Vetle I. Torvik – Neil R. Smalheiser, ADAM: another database of abbreviations in MEDLINE, Bioinformatics 22 (2006), 2813–2818.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors guarantee that the work is their own original creation and does not infringe any statutory or common-law copyright or any proprietary right of any third party. In case of claims by third parties, authors commit their self to defend the interests of the publisher, and shall cover any potential costs.
More in: Submission chapter
