Visual content-based web page categorization with deep transfer learning and metric learning.
[EN]The growing amounts of online multimedia content challenge the current search, recommendation and information retrieval systems. Information in the form of visual elements is highly valuable in a range of web mining tasks. However, the mining of these resources is a difficult task due to the com...
| Autores: | , , |
|---|---|
| Tipo de recurso: | artículo |
| Estado: | Versión publicada |
| Fecha de publicación: | 2019 |
| País: | España |
| Institución: | Universidad de Salamanca (USAL) |
| Repositorio: | GREDOS. Repositorio Institucional de la Universidad de Salamanca |
| OAI Identifier: | oai:gredos.usal.es:10366/157119 |
| Acceso en línea: | http://hdl.handle.net/10366/157119 |
| Access Level: | acceso abierto |
| Palabra clave: | Web page categorization Metric learning Transfer learning Deep learning 1203.17 Informática |
| Sumario: | [EN]The growing amounts of online multimedia content challenge the current search, recommendation and information retrieval systems. Information in the form of visual elements is highly valuable in a range of web mining tasks. However, the mining of these resources is a difficult task due to the complexity and variability of images, and the cost of collecting big enough datasets to successfully train accurate deep learning models. This paper proposes a novel framework for the categorization of web pages on the basis of their visual content. This is achieved by exploring the joint application of a transfer learning strategy and metric learning techniques to build a Deep Convolutional Neural Network (DCNN) for feature extrac- tion, even when training data is scarce. The obtained experimental results evidence that the proposed approach outperforms the state-of-the-art handcrafted image descriptors and achieves a high categoriza- tion accuracy. In addition, we address the problem of over-time learning, so the proposed framework can learn to identify new web page categories as new labeled images are provided at test time. As a result, prior knowledge of the complete set of possible web categories is not necessary in the initial training phase. |
|---|