Lexical Tools for Low-Resource Languages: A Livonian Case-Study

Valts Ernštreits

Ayuda

Lexical Tools for Low-Resource Languages: A Livonian Case-Study

Valts Ernštreits ^[1]
1. [1] University of Latvia
  
  University of Latvia
  
  Letonia
Localización: Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference. 1-3 October 2019, Sintra, Portugal / Iztok Kosem (ed. lit.), Tanara Zingano Kuhn (ed. lit.), Margarita Correia (ed. lit.), José Pedro Ferreira (ed. lit.), Maarten Jansen (ed. lit.), Isabel Pereira (ed. lit.), Jelena Kallas (ed. lit.), Miloš Jakubíček (ed. lit.), Simon Krek (ed. lit.), Carole Tiberius (ed. lit.), 2019, págs. 161-170
Idioma: inglés
Enlaces
- Texto completo (pdf)
Resumen
- This article focuses on the empirical experience and conclusions, resulting from the creation of language research and acquisition tools for Livonian – one of the smallest languages in Europe. A cluster was created for Livonian containing three interconnected databases, each with distinct types of data – lexical, morphological, and a corpus. The lexical database contains the lemmas and their data, the morphological database stores morphological forms, while all textual material, including the dictionary examples, is in the corpus. When indexing the corpus, every word refers to a lemma in the lexical database and its morphological information (new lemmas are added prior to indexation), ensuring consistency of the language data, and from each database the full data set of the other databases can be accessed. The function of each cluster is to extract the maximum amount of information from limited data sources. While technologies designed for languages with a large number of speakers focus on using quantitative methods and automation to extract qualitative information from a large and constantly expanding amount of linguistic data, the main function of technologies designed for small languages is to extract the same type of information from a limited and largely static data set. This article also examines a string of problems faced when working with a small amount of resources (inadequate language data, insufficient personnel, lack of rules for automating processes, etc.) and methods for resolving these problems in the case of Livonian.