In this article, we compare two methods to integrate a specific class of multiword expressions, Verb+Noun collocations, into a French - Romanian lexical alignment tool. In our experiments, we use a French - Romanian parallel corpus for law domain. This corpus is tokenized, tagged, lemmatized and chunked. The first method uses a dictionary-based approach to complete Verb+Noun collocations alignment. The second method proposes an alignment algorithm which uses a set of MWEs candidates previously extracted from the monolingual part of the training corpus. These candidates were detected by a hybrid extraction method combining statistical measures and linguistic filters. The best results were obtained with the hybrid method.
© 2001-2024 Fundación Dialnet · Todos los derechos reservados