Ayuda
Ir al contenido

Dialnet


Resumen de Large vocabulary continuous speech recognition for the transcription of catalan broadcast news and conversations. Towards analysis and modelling of acoustic reduction in spontaneous speech

Henrik Schulz

  • The transcription of spontaneous speech still poses a challenge to state-of-the-art methods for automatic speech recognition. The present thesis describes the comprehensive development of a large vocabulary continuous speech recognition system for the transcription of Catalan broadcast news and conversions and evolves towards novel approaches for analysis and modelling of acoustic reduction in spontaneous speech.

    It emphasises initially on various conventional methods for acoustic analysis, acoustic and language modelling and hypothesis search. Improvements over the original single-pass baseline system are mainly attained by domain and speaking style emphasising interpolation of individually estimated language models, linear discriminating projection of acoustic observations that improves the phonetic class separability, speaker normalisation of the acoustic observations, speaker adaptive training and acoustic model adaptation in a multi-pass system approach.

    The analysis of acoustic reduction initially emphasises on context independent vowel and consonant specific spectral and temporal properties whose parameters display statistically significant differences between the phoneme prototypes in spontaneous speech and their canonical realisations in planned speech. The introduction of the feature space analysis provides general means to reveal these differences in conventional acoustic observations for automatic speech recognition. It displays statistically significant differences context-independently but also between adjacent phonemes in a syllable context suggesting particular reduction patterns. The analysis furthermore challenges the often suggested coherence between the co-occurring reduction of spectral and temporal properties.

    The modelling of acoustic reduction first emphasises on segment conditioned discriminating variables and variability class dependent models and variability class specific adaptation of the original acoustic model. It introduces phoneme rate as means to analyse temporal properties and feature space reduction ratio as means to analyse reduction spectral properties in conventional feature space for large vocabulary continuous speech recognition as discriminating variables. These variables are clustered and determine the classes for segment conditioned variability class dependent models and their scoring during the hypothesis search in recognition. Both approaches displays no significant performance improvement.

    Furthermore the modelling advances towards segment constituent predictability dependent models that introduces predictability as discriminating variable for variability class dependent models relying on the fundamental coherence between predictability and acoustic reduction that is suggested through the principle of least effort and the redundancy theory. It thereby emphasises on word and phoneme predictability. This approach displays no significant performance improvement.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus