Real-time human action recognition using raw depth video-based recurrent neural networks

Sánchez Caballero, Adrián|||0000-0002-3395-7568; Fuentes Jiménez, David|||0000-0001-6424-4782; Losada Gutiérrez, Cristina|||0000-0001-9545-327X

Real-time human action recognition using raw depth video-based recurrent neural networks

This work proposes and compare two different approaches for real-time human action recognition (HAR) from raw depth video sequences. Both proposals are based on the convolutional long short-term memory unit, namely ConvLSTM, with differences in the architecture and the long-term learning. The former...

Descripción completa

Detalles Bibliográficos
Autores:	Sánchez Caballero, Adrián\|\|\|0000-0002-3395-7568, Fuentes Jiménez, David\|\|\|0000-0001-6424-4782, Losada Gutiérrez, Cristina\|\|\|0000-0001-9545-327X
Tipo de recurso:	artículo
Fecha de publicación:	2022
País:	España
Institución:	Universidad de Alcalá (UAH)
Repositorio:	e_Buah Biblioteca Digital Universidad de Alcalá
Idioma:	inglés
OAI Identifier:	oai:ebuah.uah.es:10017/57907
Acceso en línea:	http://hdl.handle.net/10017/57907 https://dx.doi.org/10.1007/s11042-022-14075-5
Access Level:	acceso abierto
Palabra clave:	ConvLSTM Action recognition Depth maps Video-surveillance Electrónica Electronics

Descripción
Sumario:	This work proposes and compare two different approaches for real-time human action recognition (HAR) from raw depth video sequences. Both proposals are based on the convolutional long short-term memory unit, namely ConvLSTM, with differences in the architecture and the long-term learning. The former uses a video-length adaptive input data generator (stateless) whereas the latter explores the stateful ability of general recurrent neural networks but is applied in the particular case of HAR. This stateful property allows the model to accumulate discriminative patterns from previous frames without compromising computer memory. Furthermore, since the proposal uses only depth information, HAR is carried out preserving the privacy of people in the scene, since their identities can not be recognized. Both neural networks have been trained and tested using the large-scale NTU RGB+D dataset. Experimental results show that the proposed models achieve competitive recognition accuracies with lower computational cost compared with state-of-the-art methods and prove that, in the particular case of videos, the rarely-used stateful mode of recurrent neural networks significantly improves the accuracy obtained with the standard mode. The recognition accuracies obtained are 75.26% (CS) and 75.45% (CV) for the stateless model, with an average time consumption per video of 0.21 s, and 80.43% (CS) and 79.91%(CV) with 0.89 s for the stateful one.

Real-time human action recognition using raw depth video-based recurrent neural networks

Similares en LA Referencia