Machine Learning Study of Metabolic Networks vs ChEMBL Data of Antibacterial Compounds

Antibacterial drugs (AD) change the metabolic status of bacteria, contributing to bacterial death. However, antibiotic resistance and the emergence of multidrug-resistant bacteria increase interest in understanding metabolic network (MN) mutations and the interaction of AD vs MN. In this study, we e...

Full description

Bibliographic Details
Authors: Diéguez, Karel, Casañola, Gerardo, Torres, Roldán, Rasulev, Bakhtiyor, Green, James R., González-Díaz, Humberto
Format: article
Status:Published version
Publication Date:2022
Country:España
Institution:Consejo Superior de Investigaciones Científicas (CSIC)
Repository:DIGITAL.CSIC. Repositorio Institucional del CSIC
OAI Identifier:oai:digital.csic.es:10261/296433
Online Access:http://hdl.handle.net/10261/296433
Access Level:Open access
Keyword:ChEMBL
Information fusion
Machine learning
Antibacterial compounds
Multidrug-resistant
Complex networks
Perturbation theory
Description
Summary:Antibacterial drugs (AD) change the metabolic status of bacteria, contributing to bacterial death. However, antibiotic resistance and the emergence of multidrug-resistant bacteria increase interest in understanding metabolic network (MN) mutations and the interaction of AD vs MN. In this study, we employed the IFPTML = Information Fusion (IF) + Perturbation Theory (PT) + Machine Learning (ML) algorithm on a huge dataset from the ChEMBL database, which contains >155,000 AD assays vs >40 MNs of multiple bacteria species. We built a linear discriminant analysis (LDA) and 17 ML models centered on the linear index and based on atoms to predict antibacterial compounds. The IFPTML-LDA model presented the following results for the training subset: specificity (Sp) = 76% out of 70,000 cases, sensitivity (Sn) = 70%, and Accuracy (Acc) = 73%. The same model also presented the following results for the validation subsets: Sp = 76%, Sn = 70%, and Acc = 73.1%. Among the IFPTML nonlinear models, the k nearest neighbors (KNN) showed the best results with Sn = 99.2%, Sp = 95.5%, Acc = 97.4%, and Area Under Receiver Operating Characteristic (AUROC) = 0.998 in training sets. In the validation series, the Random Forest had the best results: Sn = 93.96% and Sp = 87.02% (AUROC = 0.945). The IFPTML linear and nonlinear models regarding the ADs vs MNs have good statistical parameters, and they could contribute toward finding new metabolic mutations in antibiotic resistance and reducing time/costs in antibacterial drug research.