Urista Diana V, Carrué Diego B, Otero Iago, Arrasate Sonia, Quevedo-Tumailli Viviana F, Gestal Marcos, González-Díaz Humbert, Munteanu Cristian R
Department of Organic Chemistry II, University of Basque Country (UPV/EHU), Sarriena w/n, 48940 Leioa, Spain.
RNASA-IMEDIR, Computer Science Faculty, CITIC, University of A Coruna, Campus Elviña s/n, 15071 A Coruña, Spain.
Biology (Basel). 2020 Jul 30;9(8):198. doi: 10.3390/biology9080198.
Drug-decorated nanoparticles (DDNPs) have important medical applications. The current work combined Perturbation Theory with Machine Learning and Information Fusion (PTMLIF). Thus, PTMLIF models were proposed to predict the probability of nanoparticle-compound/drug complexes having antimalarial activity (against Plasmodium). The aim is to save experimental resources and time by using a virtual screening for DDNPs. The raw data was obtained by the fusion of experimental data for nanoparticles with compound chemical assays from the ChEMBL database. The inputs for the eight Machine Learning classifiers were transformed features of drugs/compounds and nanoparticles as perturbations of molecular descriptors in specific experimental conditions (experiment-centered features). The resulting dataset contains 107 input features and 249,992 examples. The best classification model was provided by Random Forest, with 27 selected features of drugs/compounds and nanoparticles in all experimental conditions considered. The high performance of the model was demonstrated by the mean Area Under the Receiver Operating Characteristics (AUC) in a test subset with a value of 0.9921 ± 0.000244 (10-fold cross-validation). The results demonstrated the power of information fusion of the experimental-centered features of drugs/compounds and nanoparticles for the prediction of nanoparticle-compound antimalarial activity. The scripts and dataset for this project are available in the open GitHub repository.
药物修饰纳米颗粒(DDNP)具有重要的医学应用。当前的工作将微扰理论与机器学习和信息融合(PTMLIF)相结合。因此,提出了PTMLIF模型来预测纳米颗粒 - 化合物/药物复合物具有抗疟活性(针对疟原虫)的概率。目的是通过对DDNP进行虚拟筛选来节省实验资源和时间。原始数据是通过将纳米颗粒的实验数据与来自ChEMBL数据库的化合物化学分析数据融合而获得的。八个机器学习分类器的输入是药物/化合物和纳米颗粒的变换特征,作为特定实验条件下分子描述符的微扰(以实验为中心的特征)。所得数据集包含107个输入特征和249,992个示例。最佳分类模型由随机森林提供,在所有考虑的实验条件下,有27个药物/化合物和纳米颗粒的选定特征。在测试子集中,通过接收器操作特征曲线下的平均面积(AUC)值为0.9921±0.000244(10折交叉验证)证明了该模型的高性能。结果证明了药物/化合物和纳米颗粒以实验为中心的特征的信息融合在预测纳米颗粒 - 化合物抗疟活性方面的能力。该项目的脚本和数据集可在开放的GitHub存储库中获得。