This paper presents a new type on corpus-based information resource: supracorpora databases (SCDBs). SCDBs are designed to enhance functionality of linguistic corpora by supporting customizable manual annotation of linguistic items, including multi-word items. This is similar to query result categorization functions available in some corpora and to functions provided by some of the standalone corpus annotation tools, although many features supported by SCDBs are more sophisticated (e.g. they allow for detailed annotation of multi-word linguistic items, including specification of main words and immediate context). More importantly still, SCDBs allow researchers to create annotated translation correspondences (TCs) in parallel corpora. Aggregation of searchable TCs in a SCDB represents a unique information resource that facilitates creation of new explicit knowledge about cross-linguistic correspondences and translation models. An overview of four SCDBs developed up to date is also included in this paper.
© 2001-2025 Fundación Dialnet · Todos los derechos reservados