Ayuda
Ir al contenido

Dialnet


Comparable corpora BootCaT

  • Autores: Adam Kilgarriff, Avinesh PVS, Jan Pomikálek
  • Localización: Electronic lexicography in the 21st century: New Applications for New Users : Proceedings of eLex 2011, Bled, 10-12 November 2011 / coord. por Iztok Kosem, Karmen Kosem, 2011, págs. 122-128
  • Idioma: inglés
  • Enlaces
  • Resumen
    • The BootCaT method (Baroni and Bernardini, 2004) has proved a fast, effective and versatile approach to corpus building. The method has been applied to small specialist corpora for finding terminology and translations (as originally envisaged by Baroni and Bernardini), and to large, general corpora, for large numbers of languages. First we review BootCaT, and present some figures for the sizes of corpora that can be built in a few minutes, on various parameter-settings. To date BootCaT has not been applied multilingually. We explore this by building matching corpora for different languages from matching seeds. We consider three ways of obtaining matching seeds: manual translation, automatic translation, and by finding keywords from corresponding Wikipedia articles. In one experiment, we present a bilingual word sketch based on seed-translation by Google Translate. In another, seeds are from Wikipedia, and we evaluate the corpora by seeing, firstly, how many domain terms they deliver, and secondly, by seeing how often the terms in the one language are translation equivalents of the terms in the other.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno