Word normalization in Twitter using finite-state transducers

Jordi Porta ^[1] ; José Luis Sancho ^[1]
1. [1] Real Academia Española
  
  Real Academia Española
  
  Madrid, España
Localización: XXIX Congreso de la Sociedad Española de Procesamiento de Lenguaje Natural: SEPLN 2013 / coord. por Alberto Díaz Esteban, Iñaki Alegría Loinaz, Julio Villena Román, 2013, ISBN 978-84-695-8349-4, págs. 86-90
Idioma: inglés
Texto completo no disponible (Saber más ...)
Resumen
- This paper presents a linguistic approach based on weighted-finite state transducers for the lexical normalisation of Spanish Twitter messages. The sys- tem developed consists of transducers that are applied to out-of-vocabulary tokens. Transducers implement linguistic models of variation that generate sets of candidates according to a lexicon. A statistical language model is used to obtain the most probable sequence of words. The article includes a description of the compo- nents and an evaluation of the system and some of its parameters.