Resumen de Automatic voice pleasantness classification and intensity estimation for speech synthesis

Speech synthesis systems based on hidden Markov models (HMMs) have defined the beginning of a new generation of Text-to-Speech systems (TTS) technology. The stochastic-based models can simultaneously describe time and frequency domain events, while maintaining a powerful and highly flexible synthesis framework. Despite the several recognized advantages, some authors report a background buzz or a muffled voice, among other issues, which shows the need for improvements on the speech description/generation model. Since there are already several adaptations of vocoding technologies to the HMM synthesis framework and none could provide an entirely satisfying result, in this work a different approach is proposed. With the objective of improving syntactic voice quality, we propose the development of a perceptually weighted adaptive filter technique that can enhance parameter generation ability on time and frequency domains and on an intra-segmental basis. The adaptation strategy will be based on prosodic correlates of voice preference in contextualized TTS applications for maximizing voice intelligibility and overall naturalness. The proposed work will be entirely dedicated to the European Portuguese language which still lacks several resources and tools.

Acceso de usuarios registrados

¿Es nuevo? Regístrese

Coordinado por: