Reducing the loss of information through annealing text distortion

Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lis...

ver descrição completa

Detalhes bibliográficos
Autores: Granados Fontecha, Ana, Cebrián Ramos, Manuel, Camacho, David, Rodríguez Ortiz, Francisco Borja
Formato: artículo
Fecha de publicación:2011
País:España
Recursos:Universidad Autónoma de Madrid
Repositorio:Biblos-e Archivo. Repositorio Institucional de la UAM
Idioma:inglés
OAI Identifier:oai:repositorio.uam.es:10486/663413
Acesso em linha:http://hdl.handle.net/10486/663413
https://dx.doi.org/10.1109/TKDE.2010.173
Access Level:acceso abierto
Palavra-chave:Information distortion
Kolmogorov complexity
Clustering by compression
Data compression
Normalized compression distance
Informática
id ES_739a8be3486d8aedd8403de07fff995e
oai_identifier_str oai:repositorio.uam.es:10486/663413
network_acronym_str ES
network_name_str España
repository_id_str
spelling Reducing the loss of information through annealing text distortionGranados Fontecha, AnaCebrián Ramos, ManuelCamacho, DavidRodríguez Ortiz, Francisco BorjaInformation distortionKolmogorov complexityClustering by compressionData compressionNormalized compression distanceInformáticaPersonal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Granados, A. ;Cebrian, M. ; Camacho, D. ; de Borja Rodriguez, F. "Reducing the Loss of Information through Annealing Text Distortion". IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7 pp. 1090 - 1102, July 2011Compression distances have been widely used in knowledge discovery and data mining. They are parameter-free, widely applicable, and very effective in several domains. However, little has been done to interpret their results or to explain their behavior. In this paper, we take a step toward understanding compression distances by performing an experimental evaluation of the impact of several kinds of information distortion on compression-based text clustering. We show how progressively removing words in such a way that the complexity of a document is slowly reduced helps the compression-based text clustering and improves its accuracy. In fact, we show how the nondistorted text clustering can be improved by means of annealing text distortion. The experimental results shown in this paper are consistent using different data sets, and different compression algorithms belonging to the most important compression families: Lempel-Ziv, Statistical and Block-Sorting.This work was supported by the Spanish Ministry of Education and Science under TIN2010-19872 and TIN2010-19607 projects.IEEE Computer Soc.Departamento de Ingeniería InformáticaEscuela Politécnica SuperiorNeurocomputación Biológica (ING EPS-005)Análisis de Datos e Inteligencia Aplicada (ING EPS-012)Herramientas Interactivas Avanzadas (ING EPS-003)20112011-07-01research articlehttp://purl.org/coar/resource_type/c_2df8fbb1VoRhttp://purl.org/coar/version/c_970fb48d4fbd8a85info:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10486/663413https://dx.doi.org/10.1109/TKDE.2010.173reponame:Biblos-e Archivo. Repositorio Institucional de la UAMinstname:Universidad Autónoma de MadridInglésengopen accesshttp://purl.org/coar/access_right/c_abf2info:eu-repo/semantics/openAccessoai:repositorio.uam.es:10486/6634132026-06-23T12:46:27Z
dc.title.none.fl_str_mv Reducing the loss of information through annealing text distortion
title Reducing the loss of information through annealing text distortion
spellingShingle Reducing the loss of information through annealing text distortion
Granados Fontecha, Ana
Information distortion
Kolmogorov complexity
Clustering by compression
Data compression
Normalized compression distance
Informática
title_short Reducing the loss of information through annealing text distortion
title_full Reducing the loss of information through annealing text distortion
title_fullStr Reducing the loss of information through annealing text distortion
title_full_unstemmed Reducing the loss of information through annealing text distortion
title_sort Reducing the loss of information through annealing text distortion
dc.creator.none.fl_str_mv Granados Fontecha, Ana
Cebrián Ramos, Manuel
Camacho, David
Rodríguez Ortiz, Francisco Borja
author Granados Fontecha, Ana
author_facet Granados Fontecha, Ana
Cebrián Ramos, Manuel
Camacho, David
Rodríguez Ortiz, Francisco Borja
author_role author
author2 Cebrián Ramos, Manuel
Camacho, David
Rodríguez Ortiz, Francisco Borja
author2_role author
author
author
dc.contributor.none.fl_str_mv Departamento de Ingeniería Informática
Escuela Politécnica Superior
Neurocomputación Biológica (ING EPS-005)
Análisis de Datos e Inteligencia Aplicada (ING EPS-012)
Herramientas Interactivas Avanzadas (ING EPS-003)
dc.subject.none.fl_str_mv Information distortion
Kolmogorov complexity
Clustering by compression
Data compression
Normalized compression distance
Informática
topic Information distortion
Kolmogorov complexity
Clustering by compression
Data compression
Normalized compression distance
Informática
description Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Granados, A. ;Cebrian, M. ; Camacho, D. ; de Borja Rodriguez, F. "Reducing the Loss of Information through Annealing Text Distortion". IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7 pp. 1090 - 1102, July 2011
publishDate 2011
dc.date.none.fl_str_mv 2011
2011-07-01
dc.type.none.fl_str_mv research article
http://purl.org/coar/resource_type/c_2df8fbb1
VoR
http://purl.org/coar/version/c_970fb48d4fbd8a85
dc.type.openaire.fl_str_mv info:eu-repo/semantics/article
format article
dc.identifier.none.fl_str_mv http://hdl.handle.net/10486/663413
https://dx.doi.org/10.1109/TKDE.2010.173
url http://hdl.handle.net/10486/663413
https://dx.doi.org/10.1109/TKDE.2010.173
dc.language.none.fl_str_mv Inglés
eng
language_invalid_str_mv Inglés
language eng
dc.rights.none.fl_str_mv open access
http://purl.org/coar/access_right/c_abf2
dc.rights.openaire.fl_str_mv info:eu-repo/semantics/openAccess
rights_invalid_str_mv open access
http://purl.org/coar/access_right/c_abf2
eu_rights_str_mv openAccess
dc.format.none.fl_str_mv application/pdf
dc.publisher.none.fl_str_mv IEEE Computer Soc.
publisher.none.fl_str_mv IEEE Computer Soc.
dc.source.none.fl_str_mv reponame:Biblos-e Archivo. Repositorio Institucional de la UAM
instname:Universidad Autónoma de Madrid
instname_str Universidad Autónoma de Madrid
reponame_str Biblos-e Archivo. Repositorio Institucional de la UAM
collection Biblos-e Archivo. Repositorio Institucional de la UAM
repository.name.fl_str_mv
repository.mail.fl_str_mv
_version_ 1869410822093012992
score 15,300724