BEDPI-DL: Built Environment objects Detection on Panoramic Images of Bogota with Deep Learning models

Pedestrian injuries are an active area of research that requires significant investments and obtains low-efficiency results due to the lack of automatic methods to replace the manual approach, which limits coverage throughout all geographical points of the cities. With advances in Artificial Intelli...

Full description

Bibliographic Details
Author: Escallón Páez, Felipe
Format: master thesis
Status:Versión aceptada para publicación
Publication Date:2023
Country:Colombia
Institution:Universidad de los Andes
Repository:Séneca: repositorio Uniandes
Language:English
OAI Identifier:oai:repositorio.uniandes.edu.co:1992/64266
Online Access:http://hdl.handle.net/1992/64266
Access Level:Open access
Keyword:Built environment
Convolutional neural networks
Deep learning
Machine Learning
Object detection
Transformers
Ingeniería
Description
Summary:Pedestrian injuries are an active area of research that requires significant investments and obtains low-efficiency results due to the lack of automatic methods to replace the manual approach, which limits coverage throughout all geographical points of the cities. With advances in Artificial Intelligence and Computer Vision, especially in Deep Learning methods, it is desired to detect, with high accuracy, 27 selected built environment objects of a city that could either prevent or lead to pedestrian injuries in a specific geographical point. This work proposes and explores a dataset of panoramic images from Bogotá with built environment objects' annotations and a baseline Deep Learning model. The proposed metric to evaluate the models' performance is the mean Average Precision with an IoU threshold of 0.25, considering that the presence of the objects prevails over their exact localization. We annotate 992 replicas from the dataset's images and obtain a human mAP of 72.1%. Furthermore, we experiment with Faster R-CNN and deformable DETR models, this last one both with and without data transformations at the input. Our final baseline model is the two-stage deformable DETR trained for 60 epochs with random horizontal flips and random crops with a resolution (both the complete images resize and the crops) of 1950×6490, obtaining a mean Average Precision of 45.6% at an IoU threshold of 0.25.