Corpus for Complex Word Identification in Medical Spanish Texts (CWI-Med-Sp) [DATASET]

[Description of methods used for collection/generation of data] The corpus statistics and methods are explained in the following article: Federico Ortega-Riba, Leonardo Campillos-Llanos, Doaa Samy (2025) "Lexical Simplification in Spanish Texts For Patients: The Complex Word Identification Task...

Descripción completa

Detalles Bibliográficos
Autores: Ortega Riba, Federico, Campillos-Llanos, Leonardo
Tipo de recurso: conjunto de datos
Fecha de publicación:2024
País:España
Institución:Consejo Superior de Investigaciones Científicas (CSIC)
Repositorio:DIGITAL.CSIC. Repositorio Institucional del CSIC
OAI Identifier:oai:digital.csic.es:10261/373675
Acceso en línea:http://hdl.handle.net/10261/373675
https://doi.org/10.20350/digitalCSIC/16706
Access Level:acceso abierto
Palabra clave:Patient information documents
Annotated corpus
Medical text simplification
Biomedical natural language processing
Consent forms
Clinical trials
Linguistics
Medical sciences
Linguistic research
Ciencias médicas
Descripción
Sumario:[Description of methods used for collection/generation of data] The corpus statistics and methods are explained in the following article: Federico Ortega-Riba, Leonardo Campillos-Llanos, Doaa Samy (2025) "Lexical Simplification in Spanish Texts For Patients: The Complex Word Identification Task". (Under review). [Methods for processing the data] Manual annotation of complex words (CW) according to the criteria defined in the guideline explained in the companion article.