Ayuda
Ir al contenido

Dialnet


Semantic approach for building generated virtual-parallel corpora from monolingual texts

    1. [1] Polish-Japanese Academy of Information Technology

      Polish-Japanese Academy of Information Technology

      Warszawa, Polonia

  • Localización: Poznan Studies in Contemporary Linguistics, ISSN 1732-0747, ISSN-e 1897-7499, Vol. 55, Nº. 2, 2019 (Ejemplar dedicado a: Current state of the art in language technology for polish), págs. 469-490
  • Idioma: inglés
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • Several natural languages have undergone a great deal of processing, but the problem of limited textual linguistic resources remains. The manual creation of parallel corpora by humans is rather expensive and time consuming, while the language data required for statistical machine translation (SMT) do not exist in adequate quantities for their statistical information to be used to initiate the research process. On the other hand, applying known approaches to build parallel resources from multiple sources, such as comparable or quasi-comparable corpora, is very complicated and provides rather noisy output, which later needs to be further processed and requires in-domain adaptation. To optimize the performance of comparable corpora mining algorithms, it is essential to use a quality parallel corpus for training of a good data classifier. In this research, we have developed a methodology for generating an accurate parallel corpus (Czech-English, Polish-English) from monolingual resources by calculating the compatibility between the results of three machine translation systems. We have created translations of large, single-language resources by applying multiple translation systems and strictly measuring translation compatibility using rules based on the Levenshtein distance. The results produced by this approach were very favorable. The generated corpora successfully improved the quality of SMT systems and seem to be useful for many other natural language processing tasks.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno