The Impact of Word Representations on Sequential Neural MWE Identification

Nicolas Zampieri ^[1] ; Carlos Ramisch ^[1] ; Géraldine Damnati ^[2]
1. [1] Aix Marseille Université
2. [2] Orange Labs Lannion, France
Localización: Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019): August 2, 2019 Florence, Italy: Proceedings of the Workshop / Agata Savary (ed. lit.), Carla Parra Escartín (ed. lit.), Francis Bond (ed. lit.), Jelena Mitrovic (ed. lit.), Verginica Barbu Mititelu (ed. lit.), 2019, ISBN 978-1-950737-26-0, págs. 169-175
Idioma: inglés
Enlaces
- Texto completo (pdf)
Resumen
- Recent initiatives such as the PARSEME shared task have allowed the rapid development of MWE identification systems. Many of those are based on recent NLP advances, using neural sequence models that take continuous word representations as input. We study two related questions in neural verbal MWE identification: (a) the use of lemmas and/or surface forms as input features, and (b) the use of word-based or character-based embeddings to represent them. Our experiments on Basque, French, and Polish show that character-based representations yield systematically better results than word-based ones. In some cases, character-based representations of surface forms can be used as a proxy for lemmas, depending on the morphological complexity of the language.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: