Continual Learing of Hand Gestures for Human Robot Interaction

Human communication is multimodal. For years, natural language processing has been studied as a form of human-machine or human-robot interaction. In recent years, computer vision techniques have been applied to the recognition of static and dynamic gestures, and progress is being made in sign langua...

Descripción completa

Detalles Bibliográficos
Autor: Cucurull Salamero, Xavier
Tipo de recurso: tesis de maestría
Fecha de publicación:2022
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/376650
Acceso en línea:https://hdl.handle.net/2117/376650
Access Level:acceso abierto
Palabra clave:Computer vision
Human-computer interaction
Artificial intelligence
Visió per computador
aprenentatge continu
interacció humà-robot
reconeixement de gestos
aprenentatge màquina
xarxes neuronals
intel·ligència artificial
Computer Vision
Continual Learning
Human-Robot Interaction
Hand Gesture Recognition
Machine Learning
Deep Learning
Artificial Intelligence
Visió per ordinador
Interacció persona-ordinador
Intel·ligència artificial
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial
Descripción
Sumario:Human communication is multimodal. For years, natural language processing has been studied as a form of human-machine or human-robot interaction. In recent years, computer vision techniques have been applied to the recognition of static and dynamic gestures, and progress is being made in sign language recognition too. The typical way to train a machine learning algorithm to perform a classification task is to provide training examples for all the classes that need to be identified by the model. In a real-world scenario, such as in the use of assistive robots, it is useful to learn new concepts from interaction. However, unlike biological brains, artificial neural networks suffer from catastrophic forgetting, and as a result, are not good at incrementally learning new classes. In this thesis, the HAnd Gesture Incremental Learning (HAGIL) framework is proposed as a method to incrementally learn to classify static hand gestures. We show that HAGIL is able to incrementally learn up to 36 new symbols using only 5 samples for each old symbol, achieving a final average accuracy of over 90%. In addition to that, the incremental training time is reduced to a 10% of the time required when using all data available.