Ayuda
Ir al contenido

Dialnet


Three-step coreference-based summarizer for Polish news texts

    1. [1] Institute of computer Science, Polish Academy of Sciences
  • Localización: Poznan Studies in Contemporary Linguistics, ISSN 1732-0747, ISSN-e 1897-7499, Vol. 55, Nº. 2, 2019 (Ejemplar dedicado a: Current state of the art in language technology for polish), págs. 397-443
  • Idioma: inglés
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • This article addresses the problem of automatic summarization of press articles in Polish. The main novelty of this research lays in the proposal of a three-step summarization algorithm which benefits from using coreference information.

      In related work section, all coreference-based approaches to summarization are presented. Then we describe in detail all publicly available summarization tools developed for Polish language. We state the problem of single-document press article summarization for Polish, describing the training and evaluation dataset: the POLISH SUMMARIES CORPUS.

      Next, a new coreference-based extractive summarization system NICOLAS is introduced. Its algorithm utilises advanced third-party preprocessing tools to extract the coreference information from the text to be summarized. This information is transformed into a complex set of features related to coreference concepts (mentions and coreference clusters) that are used for training the summarization system (on the basis of a manually prepared gold summaries corpus).

      The proposed solution is compared to the best publicly available summarization systems for Polish language and two state-of-the-art tools, developed for English language, but adapted to Polish for this article. NICOLAS summarization system obtains best scores, for selected metrics outperforming other systems in a statistically significant way. The evaluation also contains calculation of interesting upper-bounds: human performance and theoretical upper-bound.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno