Enhanced U-Net architectures for accurate room impulse response generation via differential-phase learning

[EN] Generating accurate room impulse responses (RIRs) remains challenging, particularly regarding phase estimation. Building upon previous work utilizing encoder-decoder deep learning architectures, this paper investigates advanced techniques to improve phase prediction accuracy. We propose and eva...

ver descrição completa

Detalhes bibliográficos
Autores: Martin-Salinas, I, Belloch, Jose A., Amor-Martin, Adrian, Piñero, Gema|||0000-0002-8719-8106
Tipo de documento: artigo
Data de publicação:2025
País:España
Recursos:Universitat Politècnica de València (UPV)
Repositório:RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
Idioma:inglês
OAI Identifier:oai:dnet:riunet______::81cef6611b5a5ae1f6b187f7af98c712
Acesso em linha:https://riunet.upv.es/handle/10251/233846
Access Level:Acceso aberto
Palavra-chave:RIR
Deep learning
Signal processing
Gen AI
03.- Garantizar una vida saludable y promover el bienestar para todos y todas en todas las edades
08.- Fomentar el crecimiento económico sostenido, inclusivo y sostenible, el empleo pleno y productivo, y el trabajo decente para todos
09.- Desarrollar infraestructuras resilientes, promover la industrialización inclusiva y sostenible, y fomentar la innovación
Descrição
Resumo:[EN] Generating accurate room impulse responses (RIRs) remains challenging, particularly regarding phase estimation. Building upon previous work utilizing encoder-decoder deep learning architectures, this paper investigates advanced techniques to improve phase prediction accuracy. We propose and evaluate several enhanced U-Net models, including variants with a variational autoencoder (VAE) bottleneck and differing input conditioning methods for spatial and room parameters (embedding layers vs. normalized dense layers). A key focus is the comparison between predicting direct phase and differential phase. Furthermore, we analyze the impact of using mean absolute error (MAE) versus mean squared error (MSE) for the magnitude component of the loss function. The study also explores the efficacy of applying the Griffin-Lim algorithm as a post-processing step to refine the phase estimated by the networks. Performance is evaluated on a real RIR dataset, comparing the different model architectures, information vector encoding strategies, phase targets (direct vs. differential), loss functions, and the contribution of phase recovery algorithms to overall RIR fidelity. Results provide insights into effective strategies for enhancing phase generation in data-driven RIR synthesis.