Ayuda
Ir al contenido

Dialnet


Data Augmentation Techniques for Speech Emotion Recognition and Deep Learning

  • José Antonio Nicolás [1] ; Javier de Lope [1] ; Manuel Graña [2]
    1. [1] Universidad Politécnica de Madrid

      Universidad Politécnica de Madrid

      Madrid, España

    2. [2] Universidad del País Vasco/Euskal Herriko Unibertsitatea

      Universidad del País Vasco/Euskal Herriko Unibertsitatea

      Leioa, España

  • Localización: Bio-inspired Systems and Applications: from Robotics to Ambient Intelligence: 9th International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2022, Puerto de la Cruz, Tenerife, Spain, May 31 – June 3, 2022, Proceedings, Part II / José Manuel Ferrández Vicente (dir. congr.), José Ramón Álvarez Sánchez (dir. congr.), Félix de la Paz López (dir. congr.), Hojjat Adeli (aut.), 2022, ISBN 978-3-031-06527-9, págs. 279-288
  • Idioma: inglés
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • This paper introduces innovations both in data augmentation and deep neural network architecture for speech emotion recognition (SER). The novel architecture combines a series of convolutional layers with a final layer of long short-term memory cells to determine emotions in audio signals. The audio signals are conveniently processed to generate mel spectrograms, which are used as inputs to the deep neural network architecture. This paper proposes a selected set of data augmentation techniques that allow to reduce the network overfitting. We achieve an average recognition accuracy of 86.44% on publicly distributed databases, outperforming state-of-the-art methods.


Fundación Dialnet

Dialnet Plus

  • Más información sobre Dialnet Plus

Opciones de compartir

Opciones de entorno