Developing an algorithm for automatic recognition of acronyms and expanding acronyms in electronic texts

Authors

  • Mojca Kompara Oddelek za slovenistiko Fakulteta za humanistične študije Univerze na Primorskem

DOI:

https://doi.org/10.3986/jz.v17i2.2379

Keywords:

acronyms, expansion, algorithms

Abstract

This article presents the development of an algorithm for automatic recognition of acronyms and expanding acronyms in electronic Slovenian texts. Recognizing acronyms takes place at the lexical level by observing the qualities of acronyms, expanding acronyms, and their correspondence. The algorithm recognizes acronyms based on recognition principles, and it seeks their expanded forms in context while taking into account principles of correspondence. This article presents an algorithm based on filtering five years of the newspaper Delo in which 5,820 potential acronym-expansion pairs were extracted in 30 minutes and then cleaned up manually. The accuracy of the algorithm is 96.75 percent.

Downloads

Download data is not yet available.

References

ADAM 〈http://128.248.65.210/arrowsmith_uic/adam.html〉.

Byrd – Park 2001 = Youngja Park – Roy J. Byrd, Hybrid TextMining for Finding Abbreviations and Their Definitions, IMB Thomas J. Watson Research Center, 2001, 167–170.

Chiari 2007 = Isabella Chiari, Introduzione alla linguistica computazionale, Roma – Bari: Laterza, 2007.

Google 〈http://www.google.com/〉.

Jun Xu – Yalou Huang 2005 = Jun Xu – Yalou Huang, A Machine Learning Approach to Recognising Acronyms and Their Expansions, 2005 〈http://research.microsoft.com/en-us/people/junxu/acronymextraction-icmlc2005.pdf〉.

Larkey idr. 2000 = Leah S. Larkey idr., Acrophile: An Automated Acronym Extractor and Server, Proceedings of the fifth ACM conference on Digital libraries, 2000, 205–214.

Schwartz – Hearst 2003 = Ariel S. Schwartz – Marti A. Hearst, A simple algorithm for identifying abbreviation definitions in biomedical texts, Proceedings of the Pacific Symposium on Biocomputing, 2003, 451–462.

Šatev – Nikolov 2008 = Vesna Šatev – Nicolas Nikolov, Using the Web as a Corpus for Extracting Abbreviations in the Serbian Language, v: Jezikovne tehnologije: zbornik 11. mednarodne multikonference Informacijska družba – IS 2008, zvezek C, ur. Tomaž Erjavec – Jerneja Žganec Gros, Ljubljana: Institut Jožef Stefan, 2008, 75–79.

Taghva – Gilbreth 1999 = Kazem Taghva – Jeff Gilbreth, Recognizing acronyms and their definitions, International Journal on Document Analysis and Recognition 1 (1999), št. 4, 191–198.

Yeates 1999 = Stuart Yeates, Automatic extraction of acronyms from text, Proceedings of the Third New Zealand Computer Science Research Students’ Conference, Hamilton: University of Waikato, 1999, 117–124.

Zahariev 2004 = Manuel Zahariev, A (Acronyms): doktorska disertacija, School of Computing Science, Simon Fraser University, 2004.

ur.com 〈http://24ur.com/〉.

Published

2015-07-28

How to Cite

Kompara, M. (2015). Developing an algorithm for automatic recognition of acronyms and expanding acronyms in electronic texts. Jezikoslovni Zapiski, 17(2). https://doi.org/10.3986/jz.v17i2.2379