Relative music loudness estimation in TV broadcast audio using deep learning: an industrial perspective

Under the current copyright management business model, broadcasters are taxed by the corresponding copyright management organization according to the percentage of music they broadcast, and the collected money is then distributed among the copyright holders of that music. In the specific case of TV...

Descripción completa

Detalles Bibliográficos
Autor: Meléndez Catalán, Blai
Tipo de recurso: tesis doctoral
Estado:Versión publicada
Fecha de publicación:2021
País:España
Institución:CBUC, CESCA
Repositorio:TDR. Tesis Doctorales en Red
OAI Identifier:oai:www.tdx.cat:10803/671425
Acceso en línea:http://hdl.handle.net/10803/671425
Access Level:acceso abierto
Palabra clave:Music detection
Relative music loudness estimation
Deep learning
Copyright management industry
Public dataset
Annotation tool
Convolutional neural networks
Temporal convolutional networks
TV broadcast audio
Audio processing
Detecció de música
Estimació del volum relatiu de la música
Aprenentatge profund
Indústria del dret d’autor
Conjunt de dades públic
Eina d’anotació
Xarxes neuronals convolucionals
Xarxes temporals convolucionals
Àudio emès per TV
Processament d’àudio
62
Descripción
Sumario:Under the current copyright management business model, broadcasters are taxed by the corresponding copyright management organization according to the percentage of music they broadcast, and the collected money is then distributed among the copyright holders of that music. In the specific case of TV broadcasts, whether a musical piece is played in the foreground or the background is often a relevant factor that affects the amount of money collected and distributed. In recent years, the music industry is increasingly adopting technological solutions to automatize this process. We have conducted this industrial PhD at BMAT, a company that has an active role in providing these solutions: since 2015, this company has been offering a service that currently monitors about 4300 radio stations and TV channels to automatically detect the presence of music, and to classify it as foreground or background music. We name this task relative music loudness estimation. From an industrial point of view, this thesis focuses on the improvement of the technology behind the service; and from the academic point of view, it pursues the introduction and promotion of the task in the research field of music information retrieval, and provides computational approaches to it. The industrial and academic contributions of this thesis result from logical steps towards these goals. We first create BAT: a new open-source, web-based tool for the efficient annotation of audio events and their partial loudness in the presence of other simultaneous events. We use BAT to annotate two datasets: one private and the other public. We use the private dataset for training in the development of BMAT's new relative music loudness estimation algorithm called the Deep Music Detector. The Deep Music Detector represents the first application of deep learning within BMAT, and provides a significant boost in performance with respect to its predecessor. The public dataset, called OpenBMAT, is released in order to foster transparent, comparable and reproducible research on the task of relative music loudness estimation. We use OpenBMAT in our proposal of a novel deep learning solution to this task based on an architecture that combines regular convolutional neural networks, and temporal convolutional networks. This architecture is able to extract robust features from a time-frequency representation of an audio file, and then model them as temporal sequences, producing state-of-the-art results with an efficient usage of the network's parameters. Finally, this thesis also offers a review of the concepts, resources and literature about tasks related to the detection of music.