Ayuda
Ir al contenido

Dialnet


Resumen de New statistical and syntactic models for machine translation

Maxim Khalilov

  • machine translation (SMT) technology, which is currently considered the best way to perform MT of natural languages.

    The main goal of this thesis is to enhance the classical SMT models, introducing syntactical knowledge in the pre-translation step by reordering the source side of the corpus. To a great extent, our interest is in the value of syntax in reordering for languages with high word order disparity. A secondary objective consists of determining the potential of different language model (LM) enhancement techniques in order to improve the performance and efficiency of SMT systems.

    We start with a comprehensive study of the SMT state-of-the-art, describing the fundamental models underlying the translation process, along with a brief description of the main methods of automatic evaluation of translation quality. We emphasize phrase-based and N-gram-based SMT, analyzing the major differences between these two approaches.

    Subsequently, we concentrate on language modeling methods that have not received much attention in the SMT community. We report on experiments in applying N-gram-based SMT system adaptation to a speech transcription task, describe a positive impact of accurate cut-off threshold selection both on the model size and LM noisiness, and finally present a continuous-space LM, estimated in the form of an artificial neural network.

    Moreover, we propose a novel syntax-based approach to handle the fundamental problem of word ordering for SMT exploiting syntactic representations of source and target texts. The idea of augmenting SMT by using a syntax-based reordering step prior to translation, proposed in recent years, has been quite successful in improving translation quality, especially for translation between languages with high word order disparity.

    We provide the reader with a thorough study of the state-of-the-art reordering techniques and introduce a new classification of reordering algorithms for SMT. We then propose a new non-deterministic reordering strategy based on a syntactically augmented alignment of source and target texts and automatically extracted hierarchical reordering patterns. In the next step, we couple the novel reordering module with decoding in a deterministic way; our goal in this is to effectively tackle both global and local reordering dependencies. Finally, we propose a novel translation units blending scheme, combining bilingual tuples extracted from the parallel corpora with monotone and reordered source parts.

    The experiments are carried out on N-gram- and phrase-based SMT systems. We contrast the obtained results with the ones produced by the state-of-the-art reordering algorithms and demonstrate our methods' improvements over alternative distortion models.

    The major conclusion to be drawn from the thesis is that syntactic information is useful in handling global reordering, and it achieves better MT performance than the standard phrase-based and N-gram-based model.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus