Linguists and engineers in Natural Language Processing tend to use electronic corpora more and more. Most research has long been limited to raw (unannotated) texts or to tagged texts (annotated with parts of speech only), but these approaches suffer from a word by word perspective. A new line of research involves corpora with richer annotations such as clauses and major constituents, grammatical functions and dependency links. The first parsed corpora were the English Lancaster treebank and Penn Treebank. New ones have recently been developed for other languages.
The Penn Treebank: An Overview
págs. 5-22
págs. 23-41
págs. 43-59
págs. 73-87
Annotation of Error Types for German Newsgroup Corpus
Markus Becker, Andrew Bredenkamp, Berthold Crysmann, Judith Klein
págs. 89-100
págs. 103-127
An HPSG-Annotated Test Suite for Polish
Malgorzata Marciniak, Agnieszka Mykowiecka, Adam Przepiórkowski, Anna Kupść
págs. 129-146
Developing a Syntactic Annotation Scheme and Tools for a Spanish Treebank
Antonio Moreno Sandoval, Susana López, Fernando Sánchez, Ralph Grishman
págs. 149-163
págs. 165-187
Building the Italian Syntactic-Semantic Treebank
Simonetta Montemagni, Francesco Barsotti, Marco Battista, Nicoletta Calzolari, Ornella Corazzari, Alessandro Lenci
págs. 189-210
Automated Creation of a Medieval Portuguese Partial Treebank
Vitor Rocio, Mário Amado Alves, J. Gabriel Lopes, Maria Francisca Xavier, Graça Vicente
págs. 211-227
págs. 231-248
págs. 249-260
págs. 261-277
págs. 281-296
págs. 299-316
págs. 317-329
págs. 333-349
págs. 351-365
págs. 367-389
© 2001-2024 Fundación Dialnet · Todos los derechos reservados