Reducing the loss of information through annealing text distortion
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lis...
| Autores: | , , , |
|---|---|
| Formato: | artículo |
| Fecha de publicación: | 2011 |
| País: | España |
| Recursos: | Universidad Autónoma de Madrid |
| Repositorio: | Biblos-e Archivo. Repositorio Institucional de la UAM |
| Idioma: | inglés |
| OAI Identifier: | oai:repositorio.uam.es:10486/663413 |
| Acesso em linha: | http://hdl.handle.net/10486/663413 https://dx.doi.org/10.1109/TKDE.2010.173 |
| Access Level: | acceso abierto |
| Palavra-chave: | Information distortion Kolmogorov complexity Clustering by compression Data compression Normalized compression distance Informática |
| id |
ES_739a8be3486d8aedd8403de07fff995e |
|---|---|
| oai_identifier_str |
oai:repositorio.uam.es:10486/663413 |
| network_acronym_str |
ES |
| network_name_str |
España |
| repository_id_str |
|
| spelling |
Reducing the loss of information through annealing text distortionGranados Fontecha, AnaCebrián Ramos, ManuelCamacho, DavidRodríguez Ortiz, Francisco BorjaInformation distortionKolmogorov complexityClustering by compressionData compressionNormalized compression distanceInformáticaPersonal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Granados, A. ;Cebrian, M. ; Camacho, D. ; de Borja Rodriguez, F. "Reducing the Loss of Information through Annealing Text Distortion". IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7 pp. 1090 - 1102, July 2011Compression distances have been widely used in knowledge discovery and data mining. They are parameter-free, widely applicable, and very effective in several domains. However, little has been done to interpret their results or to explain their behavior. In this paper, we take a step toward understanding compression distances by performing an experimental evaluation of the impact of several kinds of information distortion on compression-based text clustering. We show how progressively removing words in such a way that the complexity of a document is slowly reduced helps the compression-based text clustering and improves its accuracy. In fact, we show how the nondistorted text clustering can be improved by means of annealing text distortion. The experimental results shown in this paper are consistent using different data sets, and different compression algorithms belonging to the most important compression families: Lempel-Ziv, Statistical and Block-Sorting.This work was supported by the Spanish Ministry of Education and Science under TIN2010-19872 and TIN2010-19607 projects.IEEE Computer Soc.Departamento de Ingeniería InformáticaEscuela Politécnica SuperiorNeurocomputación Biológica (ING EPS-005)Análisis de Datos e Inteligencia Aplicada (ING EPS-012)Herramientas Interactivas Avanzadas (ING EPS-003)20112011-07-01research articlehttp://purl.org/coar/resource_type/c_2df8fbb1VoRhttp://purl.org/coar/version/c_970fb48d4fbd8a85info:eu-repo/semantics/articleapplication/pdfhttp://hdl.handle.net/10486/663413https://dx.doi.org/10.1109/TKDE.2010.173reponame:Biblos-e Archivo. Repositorio Institucional de la UAMinstname:Universidad Autónoma de MadridInglésengopen accesshttp://purl.org/coar/access_right/c_abf2info:eu-repo/semantics/openAccessoai:repositorio.uam.es:10486/6634132026-06-23T12:46:27Z |
| dc.title.none.fl_str_mv |
Reducing the loss of information through annealing text distortion |
| title |
Reducing the loss of information through annealing text distortion |
| spellingShingle |
Reducing the loss of information through annealing text distortion Granados Fontecha, Ana Information distortion Kolmogorov complexity Clustering by compression Data compression Normalized compression distance Informática |
| title_short |
Reducing the loss of information through annealing text distortion |
| title_full |
Reducing the loss of information through annealing text distortion |
| title_fullStr |
Reducing the loss of information through annealing text distortion |
| title_full_unstemmed |
Reducing the loss of information through annealing text distortion |
| title_sort |
Reducing the loss of information through annealing text distortion |
| dc.creator.none.fl_str_mv |
Granados Fontecha, Ana Cebrián Ramos, Manuel Camacho, David Rodríguez Ortiz, Francisco Borja |
| author |
Granados Fontecha, Ana |
| author_facet |
Granados Fontecha, Ana Cebrián Ramos, Manuel Camacho, David Rodríguez Ortiz, Francisco Borja |
| author_role |
author |
| author2 |
Cebrián Ramos, Manuel Camacho, David Rodríguez Ortiz, Francisco Borja |
| author2_role |
author author author |
| dc.contributor.none.fl_str_mv |
Departamento de Ingeniería Informática Escuela Politécnica Superior Neurocomputación Biológica (ING EPS-005) Análisis de Datos e Inteligencia Aplicada (ING EPS-012) Herramientas Interactivas Avanzadas (ING EPS-003) |
| dc.subject.none.fl_str_mv |
Information distortion Kolmogorov complexity Clustering by compression Data compression Normalized compression distance Informática |
| topic |
Information distortion Kolmogorov complexity Clustering by compression Data compression Normalized compression distance Informática |
| description |
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Granados, A. ;Cebrian, M. ; Camacho, D. ; de Borja Rodriguez, F. "Reducing the Loss of Information through Annealing Text Distortion". IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 7 pp. 1090 - 1102, July 2011 |
| publishDate |
2011 |
| dc.date.none.fl_str_mv |
2011 2011-07-01 |
| dc.type.none.fl_str_mv |
research article http://purl.org/coar/resource_type/c_2df8fbb1 VoR http://purl.org/coar/version/c_970fb48d4fbd8a85 |
| dc.type.openaire.fl_str_mv |
info:eu-repo/semantics/article |
| format |
article |
| dc.identifier.none.fl_str_mv |
http://hdl.handle.net/10486/663413 https://dx.doi.org/10.1109/TKDE.2010.173 |
| url |
http://hdl.handle.net/10486/663413 https://dx.doi.org/10.1109/TKDE.2010.173 |
| dc.language.none.fl_str_mv |
Inglés eng |
| language_invalid_str_mv |
Inglés |
| language |
eng |
| dc.rights.none.fl_str_mv |
open access http://purl.org/coar/access_right/c_abf2 |
| dc.rights.openaire.fl_str_mv |
info:eu-repo/semantics/openAccess |
| rights_invalid_str_mv |
open access http://purl.org/coar/access_right/c_abf2 |
| eu_rights_str_mv |
openAccess |
| dc.format.none.fl_str_mv |
application/pdf |
| dc.publisher.none.fl_str_mv |
IEEE Computer Soc. |
| publisher.none.fl_str_mv |
IEEE Computer Soc. |
| dc.source.none.fl_str_mv |
reponame:Biblos-e Archivo. Repositorio Institucional de la UAM instname:Universidad Autónoma de Madrid |
| instname_str |
Universidad Autónoma de Madrid |
| reponame_str |
Biblos-e Archivo. Repositorio Institucional de la UAM |
| collection |
Biblos-e Archivo. Repositorio Institucional de la UAM |
| repository.name.fl_str_mv |
|
| repository.mail.fl_str_mv |
|
| _version_ |
1869410822093012992 |
| score |
15,300724 |