Automatic Extraction of TEI Structures in Digitized Lexical Resources using Conditional Random Fields

Autores: Mohamed Khemakhem, Luca Foppiano, Laurent Romary
Localización: Electronic lexicography in the 21st century: Proceedings of eLex 2017 conference / coord. por Iztok Kosem, Carole Tiberius, Miloš Jakubíček, Jelena Kallas, Simon Krek, Vít Baisa, 2017, págs. 598-613
Idioma: inglés
Enlaces
- Texto completo (pdf)
Resumen
- This paper presents an open source machine learning system for structuring dictionaries in digital format into TEI (Text Encoding Initiative) encoded resources. The approach is based on the extraction of overgeneralised TEI structures in a cascading fashion, by means of CRF (Conditional Random Fields) sequence labelling models. Through the experiments carried out on two different dictionary samples, we aim to highlight the strengths as well as the limitations of our approach

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: