Precision and power grip detection in egocentric hand-object Interaction using machine learning

This project, was carried out in Yverdon-les-Bains, Switzerland, between the University of Applied Sciences and Arts Western Switzerland (HEIG-VD / HES-SO) and the Centre Hospitalier Universitaire Vaudois (CHUV) in Lausanne, it focuses on the detection of grasp types from an egocentric point of view...

ver descrição completa

Detalhes bibliográficos
Autor: Huapaya Sierra, Rodrigo Arian
Formato: tesis de maestría
Fecha de publicación:2023
País:España
Recursos:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/394369
Acesso em linha:https://hdl.handle.net/2117/394369
Access Level:acceso abierto
Palavra-chave:Deep learning
Pattern recognition systems
Computer vision
reconeixement d'agafada
punts de referència de la mà
reconeixement d'objectes
estimació de la profunditat
rehabilitació
perspectiva egocèntrica
visió per computador
aprenentatge profund
grasp recognition
hand landmarks
object recognition
depth estimation
rehabilitation
egocentric perspective
computer vision
deep learning
Aprenentatge profund
Reconeixement de formes (Informàtica)
Visió per ordinador
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial
Descrição
Resumo:This project, was carried out in Yverdon-les-Bains, Switzerland, between the University of Applied Sciences and Arts Western Switzerland (HEIG-VD / HES-SO) and the Centre Hospitalier Universitaire Vaudois (CHUV) in Lausanne, it focuses on the detection of grasp types from an egocentric point of view. The objective is to accurately determine the kind of grasp (power, precision and none) performed by a user based on images captured from their perspective. The successful implementation of this grasp detection system would greatly benefit the evaluation of patients undergoing upper limb rehabilitation. Various computer vision frameworks were utilized to detect hands, interacting objects, and depth information in the images. These extracted features were then fed into deep learning models for grasp prediction. Both custom recorded datasets and open-source datasets, such as EpicKitchen and the Yale dataset, were employed for training and evaluation. In conclusion, this project achieved satisfactory results in the detection of grasp types from an egocentric viewpoint, with a 0.76 F1-macro score in the final test set. The utilization of diverse videos, including custom recordings and publicly available datasets, facilitated comprehensive training and evaluation. A robust pipeline was developed through iterative refinement, enabling the extraction of crucial features from each frame to predict grasp types accurately. Furthermore, data mixtures were proposed to enhance dataset size and improve the generalization performance of the models, which played a crucial role in the project's final stages.