Multilingual open domain key-word extractor proto-type

Autores: Alessandro Panunzi, Marco Fabbri, Massimo Moneglia
Localización: Proceedings of the XIII EURALEX International Congress (Barcelona, 15-19 July 2008) / coord. por Janet Ann DeCesaris, Elisenda Bernal, 2008, ISBN 978-84-96742-67-3, págs. 463-468
Idioma: inglés
Enlaces
- Texto completo (pdf)
Resumen
- Automatic Keyword extraction is now a mature language technology. It enables the annotation of large amount of documents for content-gathering, indexing, searching and for its identification, in general. The reliability of results when processing documents in a multilingual environment, however, is still a challenge, particularly when documents are not limited to one specific semantic domain. The use of multi-term descriptors seems to be a good mean to identify the content. According to our previous evaluations (Panunzi et al.
  
  2006a, 2006b), the availability of multi-term keywords increases the performance with respect to mono-term keywords of 100% relative factor. The LABLITA tool presented in this demo works now in a multilingual environment, as well. The demo calculates on the fly the number of mono-term and multiword keywords of parallel documents in English, Italian, German, French and Spanish, and will allow the audience to judge: a) the enhancement bared by multiword keywords for the identification of content; and b) the comparability of performance obtained by the tool processing different languages.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: