Vanishing Mask Refinement in Semi-Supervised Video Segmentation

This paper presents a novel architecture, Video Object Segmentation Enhanced with Segment Anything Model, aimed at improving Semi-supervised Video Object Segmentation models by refining each output object mask with fundation models. Video Object Segmentation is a significant focus in the field of co...

ver descrição completa

Detalhes bibliográficos
Autores: Pita, Javier, Llerena Caña, Juan Pedro|||0000-0002-3476-6261, Patricio Guisado, Miguel Ángel, Berlanga, Antonio, Usero Aragonés, Luis|||0000-0001-8658-9992
Formato: artículo
Fecha de publicación:2024
País:España
Recursos:Universidad de Alcalá (UAH)
Repositorio:e_Buah Biblioteca Digital Universidad de Alcalá
Idioma:inglés
OAI Identifier:oai:ebuah.uah.es:10017/64659
Acesso em linha:http://hdl.handle.net/10017/64659
https://dx.doi.org/10.2139/ssrn.4876026
Access Level:acceso abierto
Palavra-chave:Video Object Segmentation
Long-Term Videos
Deep Learning
Informática
Computer science
Descrição
Resumo:This paper presents a novel architecture, Video Object Segmentation Enhanced with Segment Anything Model, aimed at improving Semi-supervised Video Object Segmentation models by refining each output object mask with fundation models. Video Object Segmentation is a significant focus in the field of computer vision, with object appearance, occlusions, camera movements, or perspective alterations being the main challenge to overcome. This study explores the diverse inputs accepted by Segment Anything Model in order to establish the optimal configuration for our model by intense testing. The results on established video segmentation datasets demonstrate that our proposal enhances the mask outputs of the base model for single object, multi-object, and long video datasets and sets the basis for future exploration by the combination of these two architectures.