The recent improvements in neural MT (NMT) have driven a shift from statistical MT (SMT) to NMT, which has propelled the use of post-editing (PE) in translation workflows. However, many professional translators state that if the quality of the MT output is not good enough, they delete the remaining segments and translate everything from scratch. The problem is that usual automatic measurements do not always indicate the quality of the MT output, especially with high quality outputs, and there is still no clear correlation between PE effort and productivity scores.
We combine quantitative and qualitative methods to study some of the usual automatic metrics used to evaluate the quality of MT output, and compare them to measures of post-editing effort. Then, we study in detail different direct and indirect measures of effort in order to establish a correlation among them. We complement this study with the analysis of translators’ perceptions of the task.
Finally, we conduct a fine-grained analysis of MT errors based on postediting corrections and suggest an error-based approach to evaluate raw MT output which includes the use of challenge sets.
© 2001-2024 Fundación Dialnet · Todos los derechos reservados