SynthVal: A framework for validating synthetic medical images

Synthetic data is increasingly used in medical imaging to overcome data scarcity and privacy constraints. However, assessing the fidelity of synthetic images remains a critical challenge for ensuring their safe and effective use in clinical and AI applications. We present SynthVal, a Python-based fr...

Descripción completa

Detalles Bibliográficos
Autores: Guidotti, Dario, Pandolfo, Laura, Gutiérrez Torre, Alberto|||0000-0002-5548-3359, López Rúbio, Omar, Pulina, Luca
Tipo de recurso: artículo
Fecha de publicación:2025
País:España
Institución:Universitat Politècnica de Catalunya (UPC)
Repositorio:UPCommons. Portal del coneixement obert de la UPC
Idioma:inglés
OAI Identifier:oai:upcommons.upc.edu:2117/447083
Acceso en línea:https://hdl.handle.net/2117/447083
https://dx.doi.org/10.1109/ACCESS.2025.3633780
Access Level:acceso abierto
Palabra clave:Synthetic data validation
Similarity metrics
Health data
Features extraction
Àrees temàtiques de la UPC::Informàtica::Aplicacions de la informàtica::Bioinformàtica
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial
Descripción
Sumario:Synthetic data is increasingly used in medical imaging to overcome data scarcity and privacy constraints. However, assessing the fidelity of synthetic images remains a critical challenge for ensuring their safe and effective use in clinical and AI applications. We present SynthVal, a Python-based framework for validating the quality of synthetic medical images through statistical comparisons in deep feature space. SynthVal extracts semantic image embeddings using transformer-based models and computes similarity metrics – including Fréchet Distance, Wasserstein Distance, and Kullback-Leibler Divergence – between real and synthetic data distributions. The framework is designed for modularity, scalability, and ease of integration into existing workflows via pip installation. We evaluate SynthVal using real images from the CSAW-CC dataset, and synthetic images produced by a model developed by the Barcelona Supercomputing Center. Our experiments systematically benchmark the influence of different feature extraction models and similarity metrics, providing practical insights for selecting validation strategies in medical image synthesis. SynthVal offers a reproducible and extensible solution for quality control in synthetic data pipelines.