Improving slow-moving object detection in complex environments using a feature pooling enhanced encoder-decoder model

The ability to detect moving objects is of great importance in a wide range of visual surveillance systems, playing a vital role in maintaining security and ensuring effective monitoring. However, the primary aim of such systems is to detect objects in motion and tackle real-world challenges effecti...

Descripción completa

Detalles Bibliográficos
Autores: Panigrahi, Upasana, Prabodh Kumar Sahoo, Kumar Panda, Manoj, Panda, Ganapati
Tipo de recurso: artículo
Fecha de publicación:2025
País:España
Institución:Universitat Autònoma de Barcelona
Repositorio:Dipòsit Digital de Documents de la UAB
Idioma:inglés
OAI Identifier:oai:ddd.uab.cat:322514
Acceso en línea:https://ddd.uab.cat/record/322514
https://dx.doi.org/urn:doi:10.5565/rev/elcvia.2023
Access Level:acceso abierto
Palabra clave:Background subtraction
Deep neural network
Transfer learning
Slow moving object
Feature pooling framework
Encoder-decoder type network
Descripción
Sumario:The ability to detect moving objects is of great importance in a wide range of visual surveillance systems, playing a vital role in maintaining security and ensuring effective monitoring. However, the primary aim of such systems is to detect objects in motion and tackle real-world challenges effectively. Despite the existence of numerous methods, there remains room for improvement, particularly in slowly moving video sequences and unfamiliar video environments. In videos where slow-moving objects are confined to a small area, it can cause many traditional methods to fail to detect the entire object. However, an effective solution is the spatial-temporal framework. Additionally, the selection of temporal, spatial, and fusion algorithms is crucial for effectively detecting slow-moving objects. This article presents a notable effort to address the detection of slowly moving objects in challenging videos by leveraging an encoder-decoder architecture incorporating a modified VGG-16 model with a feature pooling framework. Several novel aspects characterize the proposed algorithm: it utilizes a pre-trained modified VGG-16 network as the encoder, employing transfer learning to enhance model efficacy. The encoder is designed with a reduced number of layers and incorporates skip connections to extract essential fine and coarse-scale features crucial for local change detection. The feature pooling framework (FPF) utilizes a combination of different layers including max pooling, convolutional, and numerous atrous convolutional with varying rates of sampling. This integration enables the preservation of features at different scales with various dimensions, ensuring their representa tion across a wide range of scales. The decoder network comprises stacked convolutional layers effectively mapping features to image space. The performance of the developed technique is assessed in comparison to various existing methods, including those by CMRM, Hybrid algorithm, Fast valley, EPMCB, and MODCVS, showcasing its effectiveness through both subjective and objective analyses. It demonstrates superior performance, with an average F-measure (AF) value of 98.86% and a lower average misclassification error (AMCE) value of 0.85. Furthermore, the algorithm's effectiveness is validated on Imperceptible Video Configuration video setups, where it exhibits superior performance.