Ayuda
Ir al contenido

Dialnet


Predicting corpus example quality via supervised machine learning

    1. [1] University of Zagreb

      University of Zagreb

      Croacia

  • Localización: Electronic lexicography in the 21st century: linking lexical data in the digital age : proceedings of the eLex 2015 conference, 11-13 August 2015, Herstmonceux Castle, United Kingdom / Iztok Kosem (ed. lit.), Miloš Jakubíček (ed. lit.), Jelena Kallas (ed. lit.), Simon Krek (ed. lit.), 2015, ISBN 978-961-93594-3-3, págs. 477-485
  • Idioma: inglés
  • Enlaces
  • Resumen
    • In this paper we present a supervised-learning approach to extracting good dictionary examples from corpora.We train our predictor of quality on a dataset of corpus examples annotated with a four-level ordinal variable, ranging from a very bad to a very good example. Each of the examples is formally described through 23 variables; the dependence of the quality of which is modelled using a regression model. The evaluation of the ranked results for each of the collocations in the annotated dataset shows that we obtain precision on 10 top-ranked examples of ~80% and a precision of ~90% on the three top-ranked examples. Our approach is highly language independent as well, suffering almost no loss on the 10 top-ranked examples and a loss of ~4% on the three highest-ranked examples once the language-dependent and knowledge-source-dependent features are removed.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno