Improving slow-moving object detection in complex environments using a feature pooling enhanced encoder-decoder model
The ability to detect moving objects is of great importance in a wide range of visual surveillance systems, playing a vital role in maintaining security and ensuring effective monitoring. However, the primary aim of such systems is to detect objects in motion and tackle real-world challenges effecti...
| Autores: | , , , |
|---|---|
| Tipo de recurso: | artículo |
| Fecha de publicación: | 2025 |
| País: | España |
| Institución: | Universitat Autònoma de Barcelona |
| Repositorio: | Dipòsit Digital de Documents de la UAB |
| Idioma: | inglés |
| OAI Identifier: | oai:ddd.uab.cat:322514 |
| Acceso en línea: | https://ddd.uab.cat/record/322514 https://dx.doi.org/urn:doi:10.5565/rev/elcvia.2023 |
| Access Level: | acceso abierto |
| Palabra clave: | Background subtraction Deep neural network Transfer learning Slow moving object Feature pooling framework Encoder-decoder type network |
| Sumario: | The ability to detect moving objects is of great importance in a wide range of visual surveillance systems, playing a vital role in maintaining security and ensuring effective monitoring. However, the primary aim of such systems is to detect objects in motion and tackle real-world challenges effectively. Despite the existence of numerous methods, there remains room for improvement, particularly in slowly moving video sequences and unfamiliar video environments. In videos where slow-moving objects are confined to a small area, it can cause many traditional methods to fail to detect the entire object. However, an effective solution is the spatial-temporal framework. Additionally, the selection of temporal, spatial, and fusion algorithms is crucial for effectively detecting slow-moving objects. This article presents a notable effort to address the detection of slowly moving objects in challenging videos by leveraging an encoder-decoder architecture incorporating a modified VGG-16 model with a feature pooling framework. Several novel aspects characterize the proposed algorithm: it utilizes a pre-trained modified VGG-16 network as the encoder, employing transfer learning to enhance model efficacy. The encoder is designed with a reduced number of layers and incorporates skip connections to extract essential fine and coarse-scale features crucial for local change detection. The feature pooling framework (FPF) utilizes a combination of different layers including max pooling, convolutional, and numerous atrous convolutional with varying rates of sampling. This integration enables the preservation of features at different scales with various dimensions, ensuring their representa tion across a wide range of scales. The decoder network comprises stacked convolutional layers effectively mapping features to image space. The performance of the developed technique is assessed in comparison to various existing methods, including those by CMRM, Hybrid algorithm, Fast valley, EPMCB, and MODCVS, showcasing its effectiveness through both subjective and objective analyses. It demonstrates superior performance, with an average F-measure (AF) value of 98.86% and a lower average misclassification error (AMCE) value of 0.85. Furthermore, the algorithm's effectiveness is validated on Imperceptible Video Configuration video setups, where it exhibits superior performance. |
|---|