AI-based glioma grading for a trustworthy diagnosis: an analytical pipeline for improved reliability

Glioma is the most common type of tumor in humans originating in the brain. According to the World Health Organization, gliomas can be graded on a four-stage scale, ranging from the most benign to the most malignant. The grading of these tumors from image information is a far from trivial task for r...

Descripción completa

Detalles Bibliográficos
Autores: Pitarch i Abaigar, Carla|||0000-0002-6015-244X, Ribas Ripoll, Vicente Jorge, Vellido Alcacena, Alfredo|||0000-0002-9843-1911
Tipo de recurso: artículo
Fecha de publicación:2023
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/390777
Acceso en línea:https://hdl.handle.net/2117/390777
https://dx.doi.org/10.3390/cancers15133369
Access Level:acceso abierto
Palabra clave:Gliomas
Tumors -- Classification
Machine learning
Glioma
Tumor grading
Decision support
Neuro-oncology
Radiology
Trustworthiness
Model certainty
Model robustness
Reliability
Gliomes
Tumors -- Classificació
Aprenentatge automàtic
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic
Àrees temàtiques de la UPC::Informàtica::Aplicacions de la informàtica::Bioinformàtica
Descripción
Sumario:Glioma is the most common type of tumor in humans originating in the brain. According to the World Health Organization, gliomas can be graded on a four-stage scale, ranging from the most benign to the most malignant. The grading of these tumors from image information is a far from trivial task for radiologists and one in which they could be assisted by machine-learning-based decision support. However, the machine learning analytical pipeline is also fraught with perils stemming from different sources, such as inadvertent data leakage, adequacy of 2D image sampling, or classifier assessment biases. In this paper, we analyze a glioma database sourced from multiple datasets using a simple classifier, aiming to obtain a reliable tumor grading and, on the way, we provide a few guidelines to ensure such reliability. Our results reveal that by focusing on the tumor region of interest and using data augmentation techniques we significantly enhanced the accuracy and confidence in tumor classifications. Evaluation on an independent test set resulted in an AUC-ROC of 0.932 in the discrimination of low-grade gliomas from high-grade gliomas, and an AUC-ROC of 0.893 in the classification of grades 2, 3, and 4. The study also highlights the importance of providing, beyond generic classification performance, measures of how reliable and trustworthy the model’s output is, thus assessing the model’s certainty and robustness.