Enhanced U-Net architectures for accurate room impulse response generation via differential-phase learning
[EN] Generating accurate room impulse responses (RIRs) remains challenging, particularly regarding phase estimation. Building upon previous work utilizing encoder-decoder deep learning architectures, this paper investigates advanced techniques to improve phase prediction accuracy. We propose and eva...
| Autores: | , , , |
|---|---|
| Tipo de documento: | artigo |
| Data de publicação: | 2025 |
| País: | España |
| Recursos: | Universitat Politècnica de València (UPV) |
| Repositório: | RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia |
| Idioma: | inglês |
| OAI Identifier: | oai:dnet:riunet______::81cef6611b5a5ae1f6b187f7af98c712 |
| Acesso em linha: | https://riunet.upv.es/handle/10251/233846 |
| Access Level: | Acceso aberto |
| Palavra-chave: | RIR Deep learning Signal processing Gen AI 03.- Garantizar una vida saludable y promover el bienestar para todos y todas en todas las edades 08.- Fomentar el crecimiento económico sostenido, inclusivo y sostenible, el empleo pleno y productivo, y el trabajo decente para todos 09.- Desarrollar infraestructuras resilientes, promover la industrialización inclusiva y sostenible, y fomentar la innovación |
| Resumo: | [EN] Generating accurate room impulse responses (RIRs) remains challenging, particularly regarding phase estimation. Building upon previous work utilizing encoder-decoder deep learning architectures, this paper investigates advanced techniques to improve phase prediction accuracy. We propose and evaluate several enhanced U-Net models, including variants with a variational autoencoder (VAE) bottleneck and differing input conditioning methods for spatial and room parameters (embedding layers vs. normalized dense layers). A key focus is the comparison between predicting direct phase and differential phase. Furthermore, we analyze the impact of using mean absolute error (MAE) versus mean squared error (MSE) for the magnitude component of the loss function. The study also explores the efficacy of applying the Griffin-Lim algorithm as a post-processing step to refine the phase estimated by the networks. Performance is evaluated on a real RIR dataset, comparing the different model architectures, information vector encoding strategies, phase targets (direct vs. differential), loss functions, and the contribution of phase recovery algorithms to overall RIR fidelity. Results provide insights into effective strategies for enhancing phase generation in data-driven RIR synthesis. |
|---|