Visual content-based web page categorization with deep transfer learning and metric learning.

López Sánchez, Daniel; González Arrieta, María Angélica; Corchado Rodríguez, Juan Manuel

Visual content-based web page categorization with deep transfer learning and metric learning.

[EN]The growing amounts of online multimedia content challenge the current search, recommendation and information retrieval systems. Information in the form of visual elements is highly valuable in a range of web mining tasks. However, the mining of these resources is a difficult task due to the com...

Descripción completa

Detalles Bibliográficos
Autores:	López Sánchez, Daniel, González Arrieta, María Angélica, Corchado Rodríguez, Juan Manuel
Tipo de recurso:	artículo
Estado:	Versión publicada
Fecha de publicación:	2019
País:	España
Institución:	Universidad de Salamanca (USAL)
Repositorio:	GREDOS. Repositorio Institucional de la Universidad de Salamanca
OAI Identifier:	oai:gredos.usal.es:10366/157119
Acceso en línea:	http://hdl.handle.net/10366/157119
Access Level:	acceso abierto
Palabra clave:	Web page categorization Metric learning Transfer learning Deep learning 1203.17 Informática

Descripción
Sumario:	[EN]The growing amounts of online multimedia content challenge the current search, recommendation and information retrieval systems. Information in the form of visual elements is highly valuable in a range of web mining tasks. However, the mining of these resources is a difficult task due to the complexity and variability of images, and the cost of collecting big enough datasets to successfully train accurate deep learning models. This paper proposes a novel framework for the categorization of web pages on the basis of their visual content. This is achieved by exploring the joint application of a transfer learning strategy and metric learning techniques to build a Deep Convolutional Neural Network (DCNN) for feature extrac- tion, even when training data is scarce. The obtained experimental results evidence that the proposed approach outperforms the state-of-the-art handcrafted image descriptors and achieves a high categoriza- tion accuracy. In addition, we address the problem of over-time learning, so the proposed framework can learn to identify new web page categories as new labeled images are provided at test time. As a result, prior knowledge of the complete set of possible web categories is not necessary in the initial training phase.

Visual content-based web page categorization with deep transfer learning and metric learning.

Similares en LA Referencia