Madrid, España
This paper presents a linguistic approach based on weighted-finite state transducers for the lexical normalisation of Spanish Twitter messages. The sys- tem developed consists of transducers that are applied to out-of-vocabulary tokens. Transducers implement linguistic models of variation that generate sets of candidates according to a lexicon. A statistical language model is used to obtain the most probable sequence of words. The article includes a description of the compo- nents and an evaluation of the system and some of its parameters.
© 2001-2024 Fundación Dialnet · Todos los derechos reservados