PTML Combinatorial Model of ChEMBL Compounds Assays for Multiple Types of Cancer

Determining the target proteins of new anti-cancer compounds is a very important task in Medicinal Chemistry. In this sense, chemists carry out preclinical assays with a high number of combinations of experimental conditions (cj). In fact, ChEMBL database contains outcomes of 65534 different antican...

Descripción completa

Detalles Bibliográficos
Autores: Bediaga Bañeres, Harbil, Arrasate Gil, Sonia, González Díaz, Humberto
Tipo de recurso: artículo
Fecha de publicación:2018
País:España
Institución:Universidad del País Vasco
Repositorio:Addi. Archivo Digital para la Docencia y la Investigación
OAI Identifier:oai:addi.ehu.eus:10810/72593
Acceso en línea:http://hdl.handle.net/10810/72593
Access Level:acceso abierto
Palabra clave:ChEMBL
anti-cancer compounds
perturbation theory
machine learning
artificial neural networks
big data
multi-target models
Descripción
Sumario:Determining the target proteins of new anti-cancer compounds is a very important task in Medicinal Chemistry. In this sense, chemists carry out preclinical assays with a high number of combinations of experimental conditions (cj). In fact, ChEMBL database contains outcomes of 65534 different anticancer activity preclinical assays for 35565 different chemical compounds (1.84 assays per compound). These assays cover different combinations of cj formed from >70 different biological activity parameters (c0), >300 different drug targets (c1), >230 cell lines (c2), and 5 organisms of assay (c3) and/or organisms of the target (c4), etc. It include a total of 45833 assays in leukemia, 2499 assays in ovarian cancer, 6227 in breast, 3499 in colon, 3159 in lung, 2750 in prostate, 601 in melanoma, etc. This is a very complex dataset with multiple Big data features. This data is hard to be rationalized by researchers in order to extract useful relationships and predict new compounds. In this context, we propose to combine Perturbation Theory (PT) ideas and Machine Learning (ML) modeling to solve this combinatorial-like problem. In this work, we report a PTML (PT + ML) model for ChEMBL dataset of preclinical assays of anti-cancer compounds. This is a simple but very powerful linear model with only three variables, AUROC = 0.872, Specificity = Sp(%) = 90.2, Sensitivity = Sn(%) = 70.6, and overall Accuracy = Ac(%) = 87.7 in training series. The model also have Sp(%) = 90.1, Sn(%) = 71.4, and Ac(%) = 87.8 in external validation series. The model use PT operators based on multi-condition moving averages to capture all the complexity of the dataset. We also compared the model with non-linear Artificial Neural Network (ANN) models obtaining similar results. This confirms the hypothesis of a linear relationship between the PT operators and the classification as anti-cancer compounds in different combinations of assay conditions. Last, we compared the model with other PTML models reported in the literature concluding that this is the only one PTML model able to predict activity against multiple types of cancer. This model is a simple but versatile tool for the prediction of the targets of anti-cancer compounds taking into consideration multiple combinations of experimental conditions in preclinical assays.