SkELL Corpora as a Part of the Language Portal Sõnaveeb: Problems and Perspectives

Kristina Koppel; Jelena Kallas; Maria Khokhlova; Vit Suchomel; Vít Baisa; Jan Michelfeit

Ayuda

SkELL Corpora as a Part of the Language Portal Sõnaveeb: Problems and Perspectives

Kristina Koppel ^[1] ; Jelena Kallas ^[1] ; Maria Khokhlova ^[2] ; Vít Suchomel ^[3] ; Vít Baisa ^[3] ; Jan Michelfeit ^[3]
1. [1] Institute of the Estonian Language
  
  Institute of the Estonian Language
  
  Kesklinna linnaosa, Estonia
2. [2] St. Petersburg State University, Russia
3. [3] Lexical Computing Ltd., Czech Republic
Mostrar afiliaciones +
Localización: Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference. 1-3 October 2019, Sintra, Portugal / Iztok Kosem (ed. lit.), Tanara Zingano Kuhn (ed. lit.), Margarita Correia (ed. lit.), José Pedro Ferreira (ed. lit.), Maarten Jansen (ed. lit.), Isabel Pereira (ed. lit.), Jelena Kallas (ed. lit.), Miloš Jakubíček (ed. lit.), Simon Krek (ed. lit.), Carole Tiberius (ed. lit.), 2019, págs. 763-782
Idioma: inglés
Enlaces
- Texto completo (pdf)
Resumen
- The paper provides an analysis of the quality and presentation of authentic corpus sentences from Sketch Engine for Language Learning (SkELL) corpora (Baisa & Suchomel 2014), based on the example of Sõnaveeb (Wordweb), a new language portal being developed in the Institute of the Estonian Language. Currently Sõnaveeb contains a total of 150,000 Estonian headwords; about 70,000 of them have Russian equivalents. Authentic corpus sentences are displayed for both languages. In some cases (e.g. terms, derived forms, compounds and multi-word expressions), corpus sentences are the only source of usage examples that are available on the portal. We describe the parameters of Good Dictionary Examples (GDEX) (Kilgarriff et al., 2008) configurations for Estonian and for Russian used for the compilation of etSkELL 2018 and ruSkELL 1.6 corpora, give an overview of an evaluation of the GDEX configuration for Estonian, and outline the requirements for the user-friendly presentation of SkELL corpora as a part of the language portal.