Deep neural networks for music and audio tagging

Jordi Pons Puig

Ayuda

Deep neural networks for music and audio tagging

Autores: Jordi Pons Puig
Directores de la Tesis: Xavier Serra Casals (dir. tes.)
Lectura: En la Universitat Pompeu Fabra ( España ) en 2019
Idioma: español
Tribunal Calificador de la Tesis: Geoffroy Peeters (presid.), Perfecto Herrera Boyer (secret.), Juhan Nam (voc.)
Programa de doctorado: Programa de Doctorado en Tecnologías de la Información y las Comunicaciones por la Universidad Pompeu Fabra
Materias:
- Matemáticas
- Ciencias de las artes y las letras
  - Teoría análisis y critica de las bellas artes
    - Música y musicología
Texto completo no disponible (Saber más ...)
Resumen
- Automatic music and audio tagging can help increase the retrieval and re-use possibilities of many audio databases that remain poorly labeled. In this dissertation, we tackle the task of music and audio tagging from the deep learning perspective and, within that context, we address the following research questions: (i) Which deep learning architectures are most appropriate for (music) audio signals? (ii) In which scenarios is waveform-based end-to-end learning feasible? (iii) How much data is required for carrying out competitive deep learning research? In pursuit of answering research question (i), we propose to use musically motivated convolutional neural networks as an alternative to designing deep learning models that is based on domain knowledge, and we evaluate several deep learning architectures for audio at a low computational cost with a novel methodology based on non-trained (randomly weighted) convolutional neural networks. Throughout our work, we find that employing music and audio domain knowledge during the model's design can help improve the efficiency, interpretability, and performance of spectrogram-based deep learning models.
  
  For research questions (ii) and (iii), we perform a study with the SampleCNN, a recently proposed end-to-end learning model, to assess its viability for music audio tagging when variable amounts of training data |ranging from 25k to 1.2M songs| are available. We compare the SampleCNN against a spectrogram-based architecture that is musically motivated and conclude that, given enough data, end-to-end learning models can achieve better results.
  
  Finally, throughout our quest for answering research question (iii), we also investigate whether a naive regularization of the solution space, prototypical networks, transfer learning, or their combination, can foster deep learning models to better leverage a small number of training examples. Results indicate that transfer learning and prototypical networks are powerful strategies in such low-data regimes.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: