Ayuda
Ir al contenido

Dialnet


Resumen de Enhancement of Esophageal Speech using signal processing algorithms on source signal and vocal tract filter

Rizwan Ishaq

  • The speech an essential component for daily life communication sometimes alter due to laryngeal cancer treatment. The advanced stage treatment for laryngeal cancer is total laryngectomy. The one of the consequences of total laryngectomy is that normal speech production destroyed and alternative speech production are needed. The Esophageal Speech (ES) is one of the alternative speech production method after total laryngectomy. The ES uses esophagus as an alternative to larynx, and the air source comes from mouth to the lower esophagus, and then release back which vibrates the esophagus and provides voicing source to the vocal tract filter. The produced speech by this method has low quality and low intelligibility due to irregular voicing source and altered vocal tract filter. This thesis, therefore presents an enhancement method for ES by transforming the source and vocal tract filter components into normal speech components. The system in the thesis, first decompose the ES into source and vocal tract filter components using Iterative Adaptive Inverse Filtering (IAIF), and then transforms these components into normal speech components. The source most effected, is first decomposed into fundamental frequency F0 curve, Harmonic to Noise Ratio (HNR) and source spectrum components. The natural glottal pulse computed from nomral speech is used with normal speech F0 curve and HNR along with original source spectrum for transformed source signal. The vocal tract filter is transformed by smoothing the vocal tract spectral peaks, and then shifting these spectral peaks to lower frequencies using second order Frequency Warping Function (FWF). The spectral peaks widths are then enlarge to make it more closure to natural speech. The system is evaluated using subjective listening tests and objectively using HNR on the Spanish ES vowels /a/, /e/, /i/, /o/, /u/, and 28 mostly used Spanish words. The subject listening tests using MOS and preference score have shown that proposed system MOS always between 3 to 4, and the preference for all the processed sample is more than 50%. The objective result using HNR has shown 10 to 15 dB improvement over the original ES samples.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus