Bridging AI and medical expertise: ChatGPT's success on the medical specialization residency admission exam in Spain

The growing use of Artificial Intelligence (AI) in healthcare, particularly focusing on the potential of generative AI models like ChatGPT-4 is a trending topic. The study examines how ChatGPT-4 performed on the national Medicine Residency exam in Spain, a highly selective test for accessing the med...

Descripción completa

Detalles Bibliográficos
Autores: Leis Machín, Angela 1974-, Mayer, Miguel Ángel, 1960-, Mayer, Alex
Tipo de recurso: artículo
Estado:Versión publicada
Fecha de publicación:2025
País:España
Institución:Universitat Pompeu Fabra
Repositorio:Repositorio Digital de la UPF
OAI Identifier:oai:dnet:rdupf_______::01ede4442349cbb48d629e93d817f953
Acceso en línea:https://hdl.handle.net/10230/72970
http://dx.doi.org/10.3233/SHTI250544
Access Level:acceso abierto
Palabra clave:ChatGPT
Generative AI
Health profession students
Medical education
Descripción
Sumario:The growing use of Artificial Intelligence (AI) in healthcare, particularly focusing on the potential of generative AI models like ChatGPT-4 is a trending topic. The study examines how ChatGPT-4 performed on the national Medicine Residency exam in Spain, a highly selective test for accessing the medical specialization training program called MIR. ChatGPT-4 answered 210 questions, including 25 that required image interpretation. The chatbot correctly answered 150 out of 200 questions, achieving an estimated ranking of around 1900-2300 out of 11,577 candidates. This performance would allow access to most medical specialties in Spain. No significant differences were found between questions requiring image analysis and those that did not, but ChatGPT struggled with more difficult questions, showing a higher error rate for complex problems just like a human being. Despite its potential as an educational and problem-solving tool, the study highlights ChatGPT's limitations, including occasional "AI hallucinations" (incorrect or nonsensical answers) and variability in responses when questions were repeated. The study emphasizes that while AI tools such as ChatGPT can assist in education and medical tasks, they cannot replace qualified healthcare professionals, and their output requires careful verification.