Learning to exploit the prior network knowledge for weakly supervised semantic segmentation

Training a convolutional neural network for semantic segmentation typically requires collecting a large amount of accurate pixel-level annotations and is a hard and expensive task. In contrast, simple image tags are easier to gather. In this paper, we introduce a novel weakly supervised semantic seg...

Descripción completa

Detalles Bibliográficos
Autores: Redondo Cabrera, Carolina, Baptista Ríos, Marcos, López Sastre, Roberto Javier|||0000-0002-2477-0152
Tipo de recurso: artículo
Fecha de publicación:2019
País:España
Institución:Universidad de Alcalá (UAH)
Repositorio:e_Buah Biblioteca Digital Universidad de Alcalá
Idioma:inglés
OAI Identifier:oai:ebuah.uah.es:10017/63249
Acceso en línea:http://hdl.handle.net/10017/63249
https://dx.doi.org/10.1109/TIP.2019.2901393
Access Level:acceso abierto
Palabra clave:Semantic segmentation
Weakly supervised
Deep learning
Telecomunicaciones
Telecommunication
Descripción
Sumario:Training a convolutional neural network for semantic segmentation typically requires collecting a large amount of accurate pixel-level annotations and is a hard and expensive task. In contrast, simple image tags are easier to gather. In this paper, we introduce a novel weakly supervised semantic segmentation model which is able to learn from image labels and just image labels. Our model uses the prior knowledge of a network trained for image recognition, employing these image annotations as an attention mechanism to identify semantic regions in the images. We then present a methodology that builds accurate class-specific segmentation masks from these regions, where neither external objectness nor saliency algorithms are required. We describe how to incorporate this mask generation strategy into a fully end-to-end trainable process, where the network jointly learns to classify and segment images. Our experiments on PASCAL VOC 2012 dataset show that exploiting these generated class-specific masks in conjunction with our novel end-to-end learning process outperforms several recent weakly supervised semantic segmentation methods that use image tags only, and even some models that leverage additional supervision or training data.