Precision and power grip detection in egocentric hand-object Interaction using machine learning
This project, was carried out in Yverdon-les-Bains, Switzerland, between the University of Applied Sciences and Arts Western Switzerland (HEIG-VD / HES-SO) and the Centre Hospitalier Universitaire Vaudois (CHUV) in Lausanne, it focuses on the detection of grasp types from an egocentric point of view...
| Autor: | |
|---|---|
| Formato: | tesis de maestría |
| Fecha de publicación: | 2023 |
| País: | España |
| Recursos: | Universitat Politècnica de Catalunya (UPC) |
| Repositorio: | UPCommons. Portal del coneixement obert de la UPC |
| Idioma: | inglés |
| OAI Identifier: | oai:upcommons.upc.edu:2117/394369 |
| Acesso em linha: | https://hdl.handle.net/2117/394369 |
| Access Level: | acceso abierto |
| Palavra-chave: | Deep learning Pattern recognition systems Computer vision reconeixement d'agafada punts de referència de la mà reconeixement d'objectes estimació de la profunditat rehabilitació perspectiva egocèntrica visió per computador aprenentatge profund grasp recognition hand landmarks object recognition depth estimation rehabilitation egocentric perspective computer vision deep learning Aprenentatge profund Reconeixement de formes (Informàtica) Visió per ordinador Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial |
| Resumo: | This project, was carried out in Yverdon-les-Bains, Switzerland, between the University of Applied Sciences and Arts Western Switzerland (HEIG-VD / HES-SO) and the Centre Hospitalier Universitaire Vaudois (CHUV) in Lausanne, it focuses on the detection of grasp types from an egocentric point of view. The objective is to accurately determine the kind of grasp (power, precision and none) performed by a user based on images captured from their perspective. The successful implementation of this grasp detection system would greatly benefit the evaluation of patients undergoing upper limb rehabilitation. Various computer vision frameworks were utilized to detect hands, interacting objects, and depth information in the images. These extracted features were then fed into deep learning models for grasp prediction. Both custom recorded datasets and open-source datasets, such as EpicKitchen and the Yale dataset, were employed for training and evaluation. In conclusion, this project achieved satisfactory results in the detection of grasp types from an egocentric viewpoint, with a 0.76 F1-macro score in the final test set. The utilization of diverse videos, including custom recordings and publicly available datasets, facilitated comprehensive training and evaluation. A robust pipeline was developed through iterative refinement, enabling the extraction of crucial features from each frame to predict grasp types accurately. Furthermore, data mixtures were proposed to enhance dataset size and improve the generalization performance of the models, which played a crucial role in the project's final stages. |
|---|