Learning to exploit the prior network knowledge for weakly supervised semantic segmentation
Training a convolutional neural network for semantic segmentation typically requires collecting a large amount of accurate pixel-level annotations and is a hard and expensive task. In contrast, simple image tags are easier to gather. In this paper, we introduce a novel weakly supervised semantic seg...
| Autores: | , , |
|---|---|
| Tipo de documento: | artigo |
| Data de publicação: | 2019 |
| País: | España |
| Recursos: | Universidad de Alcalá (UAH) |
| Repositório: | e_Buah Biblioteca Digital Universidad de Alcalá |
| Idioma: | inglês |
| OAI Identifier: | oai:ebuah.uah.es:10017/63249 |
| Acesso em linha: | http://hdl.handle.net/10017/63249 https://dx.doi.org/10.1109/TIP.2019.2901393 |
| Access Level: | Acceso aberto |
| Palavra-chave: | Semantic segmentation Weakly supervised Deep learning Telecomunicaciones Telecommunication |
| Resumo: | Training a convolutional neural network for semantic segmentation typically requires collecting a large amount of accurate pixel-level annotations and is a hard and expensive task. In contrast, simple image tags are easier to gather. In this paper, we introduce a novel weakly supervised semantic segmentation model which is able to learn from image labels and just image labels. Our model uses the prior knowledge of a network trained for image recognition, employing these image annotations as an attention mechanism to identify semantic regions in the images. We then present a methodology that builds accurate class-specific segmentation masks from these regions, where neither external objectness nor saliency algorithms are required. We describe how to incorporate this mask generation strategy into a fully end-to-end trainable process, where the network jointly learns to classify and segment images. Our experiments on PASCAL VOC 2012 dataset show that exploiting these generated class-specific masks in conjunction with our novel end-to-end learning process outperforms several recent weakly supervised semantic segmentation methods that use image tags only, and even some models that leverage additional supervision or training data. |
|---|