Named entity recognition for Polish

Michał Marcińczuk ^[1] ; Aleksander Wawer ^[2]
1. [1] Wroclaw University of Science and Technology
2. [2] Institute of Computer Science, Polish Academy of Sciences
Localización: Poznan Studies in Contemporary Linguistics, ISSN 1732-0747, ISSN-e 1897-7499, Vol. 55, Nº. 2, 2019 (Ejemplar dedicado a: Current state of the art in language technology for polish), págs. 239-269
Idioma: inglés
Texto completo no disponible (Saber más ...)
Resumen
- In this article we discuss the current state-of-the-art for named entity recognition for Polish. We present publicly available resources and open-source tools for named entity recognition. The overview includes various kind of resources, i.e. guidelines, annotated corpora (NKJP, KPWr, CEN, PST) and lexicons (NELexiconS, PNET, Gazetteer). We present the major NER tools for Polish (Sprout, NERF, Liner2, Parallel LSTM-CRFs and PolDeepNer) and discuss their performance on the reference datasets. In the article we cover identification of named entity mentions in the running text, local and global entity categorization, fine- and coarse-grained categorization and lemmatization of proper names.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: