Relative music loudness estimation in tv broadcast audio using deep learning

Blai Meléndez Catalán

Ayuda

Relative music loudness estimation in tv broadcast audio using deep learning

Autores: Blai Meléndez Catalán
Directores de la Tesis: Emilio Molina Martínez (dir. tes.), Emilia Gómez Gutiérrez (codir. tes.)
Lectura: En la Universitat Pompeu Fabra ( España ) en 2021
Idioma: español
Tribunal Calificador de la Tesis: Pedro Cano Vila (presid.), Marius Miron (secret.), Amélie Anglade (voc.)
Programa de doctorado: Programa de Doctorado en Tecnologías de la Información y las Comunicaciones por la Universidad Pompeu Fabra
Materias:
- Matemáticas
  - Ciencia de los ordenadores
    - Inteligencia artificial
    - Bancos de datos
Enlaces
- Tesis en acceso abierto en: TDX
Resumen
- Under the current copyright management business model, broadcasters are taxed by the corresponding copyright management organization according to the percentage of music they broadcast, and the collected money is then distributed among the copyright holders of that music. In the specific case of TV broadcasts, whether a musical piece is played in the foreground or the background is often a relevant factor that affects the amount of money collected and distributed. In recent years, the music industry is increasingly adopting technological solutions to automatize this process. We have conducted this industrial PhD at BMAT, a company that has an active role in providing these solutions: since 2015, this company has been offering a service that currently monitors about 4300 radio stations and TV channels to automatically detect the presence of music, and to classify it as foreground or background music. We name this task relative music loudness estimation. From an industrial point of view, this thesis focuses on the improvement of the technology behind the service; and from the academic point of view, it pursues the introduction and promotion of the task in the research field of music information retrieval, and provides computational approaches to it.
  
  The industrial and academic contributions of this thesis result from logical steps towards these goals. We first create BAT: a new open-source, web-based tool for the efficient annotation of audio events and their partial loudness in the presence of other simultaneous events. We use BAT to annotate two datasets: one private and the other public. We use the private dataset for training in the development of BMAT’s new relative music loudness estimation algorithm called the Deep Music Detector. The Deep Music Detector represents the first application of deep learning within BMAT, and provides a significant boost in performance with respect to its predecessor. The public dataset, called OpenBMAT, is released in order to foster transparent, comparable and reproducible research on the task of relative music loudness estimation. We use OpenBMAT in our proposal of a novel deep learning solution to this task based on an architecture that combines regular convolutional neural networks, and temporal convolutional networks. This architecture is able to extract robust features from a time-frequency representation of an audio file, and then model them as temporal sequences, producing state-of-the-art results with an efficient usage of the network’s parameters. Finally, this thesis also offers a review of the concepts, resources and literature about tasks related to the detection of music.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Dialnet Plus

Opciones de compartir

Opciones de entorno

Sugerencia / Errata

Coordinado por: