BEDPI-DL: Built Environment objects Detection on Panoramic Images of Bogota with Deep Learning models
Pedestrian injuries are an active area of research that requires significant investments and obtains low-efficiency results due to the lack of automatic methods to replace the manual approach, which limits coverage throughout all geographical points of the cities. With advances in Artificial Intelli...
| Author: | |
|---|---|
| Format: | master thesis |
| Status: | Versión aceptada para publicación |
| Publication Date: | 2023 |
| Country: | Colombia |
| Institution: | Universidad de los Andes |
| Repository: | Séneca: repositorio Uniandes |
| Language: | English |
| OAI Identifier: | oai:repositorio.uniandes.edu.co:1992/64266 |
| Online Access: | http://hdl.handle.net/1992/64266 |
| Access Level: | Open access |
| Keyword: | Built environment Convolutional neural networks Deep learning Machine Learning Object detection Transformers Ingeniería |
| Summary: | Pedestrian injuries are an active area of research that requires significant investments and obtains low-efficiency results due to the lack of automatic methods to replace the manual approach, which limits coverage throughout all geographical points of the cities. With advances in Artificial Intelligence and Computer Vision, especially in Deep Learning methods, it is desired to detect, with high accuracy, 27 selected built environment objects of a city that could either prevent or lead to pedestrian injuries in a specific geographical point. This work proposes and explores a dataset of panoramic images from Bogotá with built environment objects' annotations and a baseline Deep Learning model. The proposed metric to evaluate the models' performance is the mean Average Precision with an IoU threshold of 0.25, considering that the presence of the objects prevails over their exact localization. We annotate 992 replicas from the dataset's images and obtain a human mAP of 72.1%. Furthermore, we experiment with Faster R-CNN and deformable DETR models, this last one both with and without data transformations at the input. Our final baseline model is the two-stage deformable DETR trained for 60 epochs with random horizontal flips and random crops with a resolution (both the complete images resize and the crops) of 1950×6490, obtaining a mean Average Precision of 45.6% at an IoU threshold of 0.25. |
|---|