Vanishing Mask Refinement in Semi-Supervised Video Segmentation

This paper presents a novel architecture, Video Object Segmentation Enhanced with Segment Anything Model, aimed at improving Semi-supervised Video Object Segmentation models by refining each output object mask with fundation models. Video Object Segmentation is a significant focus in the field of co...

Descripción completa

Detalles Bibliográficos
Autores: Pita, Javier, Llerena Caña, Juan Pedro|||0000-0002-3476-6261, Patricio Guisado, Miguel Ángel, Berlanga, Antonio, Usero Aragonés, Luis|||0000-0001-8658-9992
Tipo de recurso: artículo
Fecha de publicación:2024
País:España
Institución:Universidad de Alcalá (UAH)
Repositorio:e_Buah Biblioteca Digital Universidad de Alcalá
Idioma:inglés
OAI Identifier:oai:ebuah.uah.es:10017/64659
Acceso en línea:http://hdl.handle.net/10017/64659
https://dx.doi.org/10.2139/ssrn.4876026
Access Level:acceso abierto
Palabra clave:Video Object Segmentation
Long-Term Videos
Deep Learning
Informática
Computer science
Descripción
Sumario:This paper presents a novel architecture, Video Object Segmentation Enhanced with Segment Anything Model, aimed at improving Semi-supervised Video Object Segmentation models by refining each output object mask with fundation models. Video Object Segmentation is a significant focus in the field of computer vision, with object appearance, occlusions, camera movements, or perspective alterations being the main challenge to overcome. This study explores the diverse inputs accepted by Segment Anything Model in order to establish the optimal configuration for our model by intense testing. The results on established video segmentation datasets demonstrate that our proposal enhances the mask outputs of the base model for single object, multi-object, and long video datasets and sets the basis for future exploration by the combination of these two architectures.