Ayuda
Ir al contenido

Dialnet


Personalización y adaptación on-line a trastornos y variaciones de la voz en sistemas de reconocimiento del habla

  • Autores: Óscar Saz Torralba
  • Directores de la Tesis: Eduardo Lleida Solano (dir. tes.)
  • Lectura: En la Universidad de Zaragoza ( España ) en 2009
  • Idioma: español
  • Tribunal Calificador de la Tesis: Helmer Strik (presid.), Alfonso Ortega Gimenez (secret.), Juan Ignacio Godino Llorente (voc.), María Inés Torres Barañano (voc.), Francesc Vallverdú Bayés (voc.)
  • Materias:
  • Enlaces
    • Tesis en acceso abierto en: Zaguán
  • Resumen
    • THIS THESIS DEALS WITH THE RESEARCH AND DEVELOPMENT OF SPEECH TECHNOLOGY-BASED SYSTEMS FOR THE REQUIREMENTS OF USERS WITH DIFFERENT IMPAIRMENTS OR DISABILITIES, WITH THE FINAL AIM OF IMPROVING THEIR QUALITY OF LIFE. AS THESE SPEAKERS USUALLY PRESENT A WIDE RANGE OF SPEECH DISORDERS, THEIR ACCESS TO ASR-BASED SYSTEMS AND SIMILAR IS DIFFICULTED. THE THESIS PROPOSES THE USE OF PERSONALIZATION TECHNIQUES TO RAISE THE PERFORMANCE OF THESE SPEECH-BASED SYSTEMS IN THE PROPOSED TASK OF DISORDERED SPEECH.

      THIS WORK PERFORMS ALL THE STEPS IN THE RESEARCH IN SPEECH TECHNOLOGIES. A NOVEL CORPUS CONTAINING NEARLY 3 HOURS OF SIGNAL FROM YOUNG DISABLED SPEAKERS AND NEARLY 9 HOURS OF DATA FROM UNIMPAIRED AGE-MATCHED INDIVIDUALS WAS ACQUIRED AND IT IS TO BE DESCRIBED IN THE THESIS. THE COLLECTED DATA FROM UNIMPAIRED SPEAKERS IS USED FOR THE DEVELOPMENT OF A BASELINE ASR SYSTEM ADAPTED TO YOUNG SPEAKERS. HOWEVER, THE BASELINE RESULTS ACHIEVED WITH THIS SYSTEM BY THE IMPAIRED SPEAKERS ARE SIGNIFICANTLY DEGRADED, COMPARED TO THEIR UNIMPAIRED PEERS, POINTING OUT THE DRAMATIC INFLUENCE OF THE SPEAKERS' DISORDERS IN THE PERFORMANCE OF THE ASR SYSTEM.

      FROM THIS STARTING POINT, A DEEP ANALYSIS OF THE DISORDERED SPEECH CORPUS IS MADE IN TWO DIRECTIONS. FIRST ONE SHOWS THE ACOUSTIC DEGRADATION SUFFERED BY THE SPEECH UTTERED BY THE IMPAIRED SPEAKERS, COMPARED TO THE CONTROL SPEECH ACQUIRED FROM THE UNIMPAIRED SPEAKERS. LATER, SPEECH AND LANGUAGE DISORDERS ARE PROVEN TO OCCUR IN THE IMPAIRED SPEAKERS BY MEANS OF ANALYZING THE PHONOLOGICAL AND PHONETIC PATTERNS IN WHICH THE SPEAKERS ARE MAKING THEIR PHONEME-LEVEL MISPRONUNCIATIONS.

      PROVEN THAT THE DISORDERED SPEECH IS DEGRADED IN BOTH ACOUSTIC AND LEXICAL LEVELS, ACOUSTIC AND LEXICAL ADAPTATION TO THE SPEAKERS IN THE CORPUS ARE STUDIED. STRONG INTERRELATIONS BETWEEN BOTH ADAPTATION FRAMEWORKS ARE OBSERVED AND THE NEED OF MATCHING BOTH ADAPTATION STRATEGIES IS POINTED OUT. BOTH ADAPTATION FRAMEWORKS (ACOUSTIC AND LEXICAL) MAKE USE OF SUPERVISED DATA-DRIVEN TECHNIQUES TO PROVIDE THE WER IMPROVEMENT IN THE RECOGNITION, WITH A LARGER INFLUENCE OF THE ACOUSTIC SIDE IN THE ASR PHASE.

      GIVEN THE IMPOSSIBILITY TO COUNT AT ANY TIME WITH LABELED DATA WHEN WORKING WITH THIS KIND OF SPEAKERS, THE NEED OF DEVELOPING A SYSTEM THAT DETECT MISPRONUNCIATIONS TO AVOID THESE ACOUSTICALLY INACCURATE PARTS OF THE SPEECH SIGNAL IS REQUIRED, PRIOR TO FEED THEM TO THE ADAPTATION SYSTEMS. TRADITIONAL LOG-LIKELIHOOD SCORING AND NORMALIZATION TRENDS IN PRONUNCIATION VERIFICATION ARE TESTED, ALTOGETHER WITH SOME NOVEL APPROACHES. THE POSSIBILITY OF IDENTIFYING LEXICALLY CORRECT AND INCORRECT SEGMENTS WITHIN THE SPEECH SIGNAL OPENS THE GATE FOR UNSUPERVISED ADAPTATION FRAMEWORKS. THESE POSSIBILITIES ARE STUDIED OVER THE DIFFERENT ACOUSTIC-LEXICAL ADAPTATION TECHNIQUES USED PRIORLY. FINALLY, A PROPOSAL FOR ON-LINE PERSONALIZATION IS MADE, WHERE THE SAME UTTERANCES THAT THE ASR HAS DECODED ARE USED FOR PERFORMING ADAPTATION AND CREATE NEW MODELS FOR THE RECOGNITION OF THE FOLLOWING UTTERANCES FROM THE SPEAKER. IN THE END, THE STRONG INFLUENCE OF THE INITIAL PERFORMANCE IN ASR IS OBSERVED, LIMITING THE POSSIBILITIES OF APPLICATION OF THESE TECHNIQUES.

      THE FINAL PART OF THE THESIS COVERS ALL THE ATTEMPTS IN THE DEVELOPMENT OF SPEECH TECHNOLOGY-BASED SYSTEMS FOR THE HANDICAPPED AND SPEECH THERAPY TOOLS DURING THIS WORK. THESE SYSTEMS ARE MAKING USE OF THE SCIENTIFIC KNOWLEDGE ACQUIRED IN THIS WORK AND ARE OPEN FOR ALL THE COMMUNITY TO USE AND SHARE. THE SET OF CASLT TOOLS IN "COMUNICA", RESULT OF A COLLABORATIVE WORK, IS SHOWN TO BE SUCCESSFUL IN PROVIDING A SEMI-SUPERVISED AID FOR THE SPEECH HANDICAPPED WITH A GREAT WELCOME BY THE COMMUNITY.

      THE SCIENTIFIC DISCUSSION AND CONCLUSIONS SHOW THAT, EVEN WHEN THERE IS STILL A GREAT LACK OF KNOWLEDGE IN THE USE OF SPEECH TECHNOLOGIES FOR DISORDERED SPEECH, THERE IS AN OPEN POSSIBILITY FOR THE CREATION OF PERSONALIZED SYSTEMS WHICH CAN PROVIDE ENHANCED ASR PERFORMANCE TO INDIVIDUALS WITH SEVERE DISABILITIES.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno