Monocular depth map estimation based on a multi-scale deep architecture and curvilinear saliency feature boosting
Estimating depth from a monocular camera is a must for many applications, including scene understanding and reconstruction, robot vision, and self-driving cars. However, generating depth maps from single RGB images is still a challenge as object shapes are to be inferred from intensity images strong...
| Autores: | , , , , |
|---|---|
| Formato: | artículo |
| Fecha de publicación: | 2022 |
| País: | España |
| Recursos: | Universidad Autónoma de Madrid |
| Repositorio: | Biblos-e Archivo. Repositorio Institucional de la UAM |
| Idioma: | inglés |
| OAI Identifier: | oai:repositorio.uam.es:10486/711445 |
| Acesso em linha: | http://hdl.handle.net/10486/711445 https://dx.doi.org/10.1007/s00521-022-07663-x |
| Access Level: | acceso abierto |
| Palavra-chave: | Monocular depth map estimation Deep autoencoders Multi-scale networks Curvilinear saliency Telecomunicaciones |
| Resumo: | Estimating depth from a monocular camera is a must for many applications, including scene understanding and reconstruction, robot vision, and self-driving cars. However, generating depth maps from single RGB images is still a challenge as object shapes are to be inferred from intensity images strongly affected by viewpoint changes, texture content and light conditions. Therefore, most current solutions produce blurry approximations of low-resolution depth maps. We propose a novel depth map estimation technique based on an autoencoder network. This network is endowed with a multi-scale architecture and a multi-level depth estimator that preserve high-level information extracted from coarse feature maps as well as detailed local information present in fine feature maps. Curvilinear saliency, which is related to curvature estimation, is exploited as a loss function to boost the depth accuracy at object boundaries and raise the performance of the estimated high-resolution depth maps. We evaluate our model on the public NYU Depth v2 and Make3D datasets. The proposed model yields superior performance on both datasets compared to the state-of-the-art, achieving an accuracy of 86% and showing exceptional performance at the preservation of object boundaries and small 3D structures. The code of the proposed model is publicly available at https://github.com/SaddamAbdulrhman/MDACSFB. |
|---|