Supracorpora Databases as Corpus-Based Superstructure for Manual Annotation of Parallel Corpora

Autores: Mikhail Kruzhkov
Localización: CILC2016: 8th International Conference on Corpus Linguistics / Antonio Moreno Ortiz (ed. lit.), Chantal Pérez Hernández (ed. lit.), 2016, págs. 236-248
Idioma: inglés
Enlaces
- Texto completo
Resumen
- This paper presents a new type on corpus-based information resource: supracorpora databases (SCDBs). SCDBs are designed to enhance functionality of linguistic corpora by supporting customizable manual annotation of linguistic items, including multi-word items. This is similar to query result categorization functions available in some corpora and to functions provided by some of the standalone corpus annotation tools, although many features supported by SCDBs are more sophisticated (e.g. they allow for detailed annotation of multi-word linguistic items, including specification of main words and immediate context). More importantly still, SCDBs allow researchers to create annotated translation correspondences (TCs) in parallel corpora. Aggregation of searchable TCs in a SCDB represents a unique information resource that facilitates creation of new explicit knowledge about cross-linguistic correspondences and translation models. An overview of four SCDBs developed up to date is also included in this paper.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: