Ayuda
Ir al contenido

Dialnet


Resumen de Comparable corpora BootCaT

Adam Kilgarriff, Avinesh PVS, Jan Pomikálek

  • The BootCaT method (Baroni and Bernardini, 2004) has proved a fast, effective and versatile approach to corpus building. The method has been applied to small specialist corpora for finding terminology and translations (as originally envisaged by Baroni and Bernardini), and to large, general corpora, for large numbers of languages. First we review BootCaT, and present some figures for the sizes of corpora that can be built in a few minutes, on various parameter-settings. To date BootCaT has not been applied multilingually. We explore this by building matching corpora for different languages from matching seeds. We consider three ways of obtaining matching seeds: manual translation, automatic translation, and by finding keywords from corresponding Wikipedia articles. In one experiment, we present a bilingual word sketch based on seed-translation by Google Translate. In another, seeds are from Wikipedia, and we evaluate the corpora by seeing, firstly, how many domain terms they deliver, and secondly, by seeing how often the terms in the one language are translation equivalents of the terms in the other.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus