Jesus Andres Ferrer
This thesis gathers some contributions to statistical pattern recognition and, more specifically, to several natural language processing (NLP) tasks, Several well-known statistical techniques are revisited in this thesis: parameter estimation, loss function design and probability modelling. The former techniques are applied to several NLP tasks such as text classification (TC), language modelling (LM) and statistical machine translation (SMT).
In parameter estimation, we tackle the smoothing problem by proposing a constrained domain maximum likelihood estimation (CDMLE) technique.
The CDMLE avoids the need of the smoothing stage that makes the maximum likelihood estimation (MLE) to lose its good theoretical properties. This technique is applied to text classification by mean of the Naive Bayes classifier. Afterwards, the CDMLE technique is extended to leaving-one-out MLE and, then, applied to LM smoothing. The results obtained in several LM tasks reported an improvement in terms of perplexity compared with the standard smoothing techniques.
Concerning the loss function, we carefully study the design of loss functions different from the 0-1 loss. We focus our study on those loss functions that while retaining a similar decoding complexity than the 0-1 loss function, provide more flexibility.
Many candidate loss functions are presented and analysed in several statistical machine translation tasks and for several translation models. We also analyse some outstanding translations rules such as the direct translation rule; and we give a further insight into the log-linear models, which are, in fact, particular cases of loss functions.
Finally, several monotone translation models are proposed based on well-known modelling techniques. Firstly, an extension to the GIATI technique is proposed to infer finite state transducers (FST). Afterwards, a phrased-based monotone translation model inspired in hidden Markov models is proposed. Lastly, a phrased-based hidden semi-Markov model is introduced. The latter model produces slightly improvements over the baseline under some circumstances.