Approaches to computational lexicography for German varieties

Andrea Abel; Stefanie Anstein

Ayuda

Approaches to computational lexicography for German varieties

Autores: Andrea Abel, Stefanie Anstein
Localización: Proceedings of the XIII EURALEX International Congress (Barcelona, 15-19 July 2008) / coord. por Janet Ann DeCesaris, Elisenda Bernal, 2008, ISBN 978-84-96742-67-3, págs. 251-260
Idioma: inglés
Enlaces
- Texto completo (pdf)
Resumen
- indispensable resource for a detailed and systematic variety comparison and dictionary development. We present desiderata and suggestions as well as methods from computational linguistics to systematically apply variety corpora for the enrichment, i.e.
  
  confirmation, extension and generation, of lexical entries in distinctive variant dictionaries for German. Examples are those variant dictionaries developed by Ammon et al. (2004) and Abfalterer (2007), where we focus on the South Tyrolean German language. On the one hand, we conducted a systematic frequency analysis in newspaper variety corpora for approved lists of South Tyrolean special vocabulary in order to possibly refine corresponding dictionary entries with corpus evidence. On the other hand, we filtered the list of words of our South Tyrolean corpus which could not be lemmatised by a tool developed for the variety in Germany. After removing special vocabulary collected for the South Tyrolean variety in other projects-e.g. legal terms, the remaining list was manually checked for possible new variant dictionary entries, thus-as an innovative variety corpus lexicographic approach-also automatically filtering a huge amount of data to extract only relevant data to be investigated in detail. In addition, we semi-automatically extracted lexical cooccurrences of our two newspaper corpora and compared their frequencies-with the assumption that those cooccurrences are worth being more closely investigated that have high frequency in the South Tyrolean corpus and very low frequency in the corpus from Germany. With these three methods we were not only able to refine dictionary entries for South Tyrolean German, but also to add new ones. The findings on variants can be reused for further corpus annotation resulting in again better resources for computational variant lexicography of the kind described, which is also to be extended to more complex linguistic levels.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: