AI-based glioma grading for a trustworthy diagnosis: an analytical pipeline for improved reliability
Glioma is the most common type of tumor in humans originating in the brain. According to the World Health Organization, gliomas can be graded on a four-stage scale, ranging from the most benign to the most malignant. The grading of these tumors from image information is a far from trivial task for r...
| Autores: | , , |
|---|---|
| Tipo de recurso: | artículo |
| Fecha de publicación: | 2023 |
| País: | España |
| Institución: | Universitat Politècnica de Catalunya (UPC) |
| Repositorio: | UPCommons. Portal del coneixement obert de la UPC |
| Idioma: | inglés |
| OAI Identifier: | oai:upcommons.upc.edu:2117/390777 |
| Acceso en línea: | https://hdl.handle.net/2117/390777 https://dx.doi.org/10.3390/cancers15133369 |
| Access Level: | acceso abierto |
| Palabra clave: | Gliomas Tumors -- Classification Machine learning Glioma Tumor grading Decision support Neuro-oncology Radiology Trustworthiness Model certainty Model robustness Reliability Gliomes Tumors -- Classificació Aprenentatge automàtic Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic Àrees temàtiques de la UPC::Informàtica::Aplicacions de la informàtica::Bioinformàtica |
| Sumario: | Glioma is the most common type of tumor in humans originating in the brain. According to the World Health Organization, gliomas can be graded on a four-stage scale, ranging from the most benign to the most malignant. The grading of these tumors from image information is a far from trivial task for radiologists and one in which they could be assisted by machine-learning-based decision support. However, the machine learning analytical pipeline is also fraught with perils stemming from different sources, such as inadvertent data leakage, adequacy of 2D image sampling, or classifier assessment biases. In this paper, we analyze a glioma database sourced from multiple datasets using a simple classifier, aiming to obtain a reliable tumor grading and, on the way, we provide a few guidelines to ensure such reliability. Our results reveal that by focusing on the tumor region of interest and using data augmentation techniques we significantly enhanced the accuracy and confidence in tumor classifications. Evaluation on an independent test set resulted in an AUC-ROC of 0.932 in the discrimination of low-grade gliomas from high-grade gliomas, and an AUC-ROC of 0.893 in the classification of grades 2, 3, and 4. The study also highlights the importance of providing, beyond generic classification performance, measures of how reliable and trustworthy the model’s output is, thus assessing the model’s certainty and robustness. |
|---|