A practical solution to estimate the sample size required for clinical prediction models generated from observational research on data

[EN] Background Estimating the required sample size is crucial when developing and validating clinical prediction models. However, there is no consensus about how to determine the sample size in such a setting. Here, the goal was to compare available methods to define a practical solution to sample...

Descripción completa

Detalles Bibliográficos
Autores: Baeza-Delgado, Carlos, Cerdá Alberich, Leonor, Veiga-Canuto, Diana, Martinez de las Heras, Blanca, Raza, Ben, Marti-Bonmati, Luis, Carot Sierra, José Miguel|||0000-0001-6524-1639
Tipo de recurso: artículo
Fecha de publicación:2022
País:España
Institución:Universitat Politècnica de València (UPV)
Repositorio:RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia
Idioma:inglés
OAI Identifier:oai:riunet.upv.es:10251/202493
Acceso en línea:https://riunet.upv.es/handle/10251/202493
Access Level:acceso abierto
Palabra clave:Sample size calculation
Clinical predictive models
PRIMAGE
Paediatric oncology
Radiology
ESTADISTICA E INVESTIGACION OPERATIVA
Descripción
Sumario:[EN] Background Estimating the required sample size is crucial when developing and validating clinical prediction models. However, there is no consensus about how to determine the sample size in such a setting. Here, the goal was to compare available methods to define a practical solution to sample size estimation for clinical predictive models, as applied to Horizon 2020 PRIMAGE as a case study. Methods Three different methods (Riley's; "rule of thumb" with 10 and 5 events per predictor) were employed to calculate the sample size required to develop predictive models to analyse the variation in sample size as a function of different parameters. Subsequently, the sample size for model validation was also estimated. Results To develop reliable predictive models, 1397 neuroblastoma patients are required, 1060 high-risk neuroblastoma patients and 1345 diffuse intrinsic pontine glioma (DIPG) patients. This sample size can be lowered by reducing the number of variables included in the model, by including direct measures of the outcome to be predicted and/or by increasing the follow-up period. For model validation, the estimated sample size resulted to be 326 patients for neuroblastoma, 246 for high-risk neuroblastoma, and 592 for DIPG. Conclusions Given the variability of the different sample sizes obtained, we recommend using methods based on epidemiological data and the nature of the results, as the results are tailored to the specific clinical problem. In addition, sample size can be reduced by lowering the number of parameter predictors, by including direct measures of the outcome of interest.