Relative music loudness estimation in TV broadcast audio using deep learning: an industrial perspective

Meléndez Catalán, Blai

Relative music loudness estimation in TV broadcast audio using deep learning: an industrial perspective

Under the current copyright management business model, broadcasters are taxed by the corresponding copyright management organization according to the percentage of music they broadcast, and the collected money is then distributed among the copyright holders of that music. In the specific case of TV...

Descripción completa

Detalles Bibliográficos
Autor:	Meléndez Catalán, Blai
Tipo de recurso:	tesis doctoral
Estado:	Versión publicada
Fecha de publicación:	2021
País:	España
Institución:	CBUC, CESCA
Repositorio:	TDR. Tesis Doctorales en Red
OAI Identifier:	oai:www.tdx.cat:10803/671425
Acceso en línea:	http://hdl.handle.net/10803/671425
Access Level:	acceso abierto
Palabra clave:	Music detection Relative music loudness estimation Deep learning Copyright management industry Public dataset Annotation tool Convolutional neural networks Temporal convolutional networks TV broadcast audio Audio processing Detecció de música Estimació del volum relatiu de la música Aprenentatge profund Indústria del dret d’autor Conjunt de dades públic Eina d’anotació Xarxes neuronals convolucionals Xarxes temporals convolucionals Àudio emès per TV Processament d’àudio 62

Descripción
Sumario:	Under the current copyright management business model, broadcasters are taxed by the corresponding copyright management organization according to the percentage of music they broadcast, and the collected money is then distributed among the copyright holders of that music. In the specific case of TV broadcasts, whether a musical piece is played in the foreground or the background is often a relevant factor that affects the amount of money collected and distributed. In recent years, the music industry is increasingly adopting technological solutions to automatize this process. We have conducted this industrial PhD at BMAT, a company that has an active role in providing these solutions: since 2015, this company has been offering a service that currently monitors about 4300 radio stations and TV channels to automatically detect the presence of music, and to classify it as foreground or background music. We name this task relative music loudness estimation. From an industrial point of view, this thesis focuses on the improvement of the technology behind the service; and from the academic point of view, it pursues the introduction and promotion of the task in the research field of music information retrieval, and provides computational approaches to it. The industrial and academic contributions of this thesis result from logical steps towards these goals. We first create BAT: a new open-source, web-based tool for the efficient annotation of audio events and their partial loudness in the presence of other simultaneous events. We use BAT to annotate two datasets: one private and the other public. We use the private dataset for training in the development of BMAT's new relative music loudness estimation algorithm called the Deep Music Detector. The Deep Music Detector represents the first application of deep learning within BMAT, and provides a significant boost in performance with respect to its predecessor. The public dataset, called OpenBMAT, is released in order to foster transparent, comparable and reproducible research on the task of relative music loudness estimation. We use OpenBMAT in our proposal of a novel deep learning solution to this task based on an architecture that combines regular convolutional neural networks, and temporal convolutional networks. This architecture is able to extract robust features from a time-frequency representation of an audio file, and then model them as temporal sequences, producing state-of-the-art results with an efficient usage of the network's parameters. Finally, this thesis also offers a review of the concepts, resources and literature about tasks related to the detection of music.

Relative music loudness estimation in TV broadcast audio using deep learning: an industrial perspective

Similares en LA Referencia