The Czech National Corpus

Autores: Jan Kocek, Marie Koprivová, Vera Schmiedtová
Localización: Proceedings of the Ninth EURALEX International Congress, EURALEX 2000: Stuttgart, Germany, August 8th - 12th, 2000 / Ulrich Heid (ed. lit.), Stefan Evert (ed. lit.), Egbert Lehmann (ed. lit.), Christian Rohrer (ed. lit.), 2000, págs. 127-132
Idioma: inglés
Enlaces
- Texto completo (pdf)
Resumen
- The paper deals with the history of the Czech National Corpus (CNC) project. It reports on the present stage of its development, describes what type of corpus it is, and the text processing methods and morphological annotation used in its compilation. It also briefly discusses the software used in the CNC.
  
  The Bank of Czech (BoC) has now 330 million word forms. It is the basis of a representative corpus (SYN2000 - 100 million word forms) which was created in spring 2000, and is intended as a material source for future dictionaries. At the moment the lexical saturation of the material is tested.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: