Suppr超能文献

一种基于深度学习的化合物-蛋白质相互作用通用预测模型。

A general prediction model for compound-protein interactions based on deep learning.

作者信息

Ji Wei, She Shengnan, Qiao Chunxue, Feng Qiuqi, Rui Mengjie, Xu Ximing, Feng Chunlai

机构信息

School of Pharmacy, Jiangsu University, Zhenjiang, China.

School of Medicine, Jiangsu University, Zhenjiang, China.

出版信息

Front Pharmacol. 2024 Sep 4;15:1465890. doi: 10.3389/fphar.2024.1465890. eCollection 2024.

Abstract

BACKGROUND

The identification of compound-protein interactions (CPIs) is crucial for drug discovery and understanding mechanisms of action. Accurate CPI prediction can elucidate drug-target-disease interactions, aiding in the discovery of candidate compounds and effective synergistic drugs, particularly from traditional Chinese medicine (TCM). Existing methods face challenges in prediction accuracy and generalization due to compound and target diversity and the lack of largescale interaction datasets and negative datasets for model learning.

METHODS

To address these issues, we developed a computational model for CPI prediction by integrating the constructed large-scale bioactivity benchmark dataset with a deep learning (DL) algorithm. To verify the accuracy of our CPI model, we applied it to predict the targets of compounds in TCM. An herb pair of and was used as a model, and the active compounds in this herb pair were collected from various public databases and the literature. The complete targets of these active compounds were predicted by the CPI model, resulting in an expanded target dataset. This dataset was next used for the prediction of synergistic antitumor compound combinations. The predicted multi-compound combinations were subsequently examined through cellular experiments.

RESULTS

Our CPI model demonstrated superior performance over other machine learning models, achieving an area under the Receiver Operating Characteristic curve (AUROC) of 0.98, an area under the precision-recall curve (AUPR) of 0.98, and an accuracy (ACC) of 93.31% on the test set. The model's generalization capability and applicability were further confirmed using external databases. Utilizing this model, we predicted the targets of compounds in the herb pair of Astragalus membranaceus and Hedyotis diffusaas, yielding an expanded target dataset. Then, we integrated this expanded target dataset to predict effective drug combinations using our drug synergy prediction model DeepMDS. Experimental assay on breast cancer cell line MDA-MB-231 proved the efficacy of the best predicted multi-compound combinations: Combination I (Epicatechin, Ursolic acid, Quercetin, Aesculetin and Astragaloside IV) exhibited a half-maximal inhibitory concentration (IC) value of 19.41 μM, and a combination index (CI) value of 0.682; and Combination II (Epicatechin, Ursolic acid, Quercetin, Vanillic acid and Astragaloside IV) displayed a IC value of 23.83 μM and a CI value of 0.805. These results validated the ability of our model to make accurate predictions for novel CPI data outside the training dataset and evaluated the reliability of the predictions, showing good applicability potential in drug discovery and in the elucidation of the bioactive compounds in TCM.

CONCLUSION

Our CPI prediction model can serve as a useful tool for accurately identifying potential CPI for a wide range of proteins, and is expected to facilitate drug research, repurposing and support the understanding of TCM.

摘要

背景

化合物 - 蛋白质相互作用(CPI)的识别对于药物发现和作用机制的理解至关重要。准确的CPI预测可以阐明药物 - 靶点 - 疾病之间的相互作用,有助于发现候选化合物和有效的协同药物,特别是来自中药(TCM)的药物。由于化合物和靶点的多样性以及缺乏用于模型学习的大规模相互作用数据集和阴性数据集,现有方法在预测准确性和泛化性方面面临挑战。

方法

为了解决这些问题,我们通过将构建的大规模生物活性基准数据集与深度学习(DL)算法相结合,开发了一种用于CPI预测的计算模型。为了验证我们的CPI模型的准确性,我们将其应用于预测中药中化合物的靶点。以黄芪和白花蛇舌草这一药对为模型,从各种公共数据库和文献中收集了该药对中的活性化合物。通过CPI模型预测这些活性化合物的完整靶点,从而得到一个扩展的靶点数据集。接下来,使用这个数据集预测协同抗肿瘤化合物组合。随后通过细胞实验对预测的多化合物组合进行检测。

结果

我们的CPI模型在性能上优于其他机器学习模型,在测试集上的受试者操作特征曲线下面积(AUROC)为0.98,精确召回率曲线下面积(AUPR)为0.98,准确率(ACC)为93.31%。使用外部数据库进一步证实了该模型的泛化能力和适用性。利用该模型,我们预测了黄芪和白花蛇舌草药对中化合物的靶点,得到了一个扩展的靶点数据集。然后,我们使用我们的药物协同预测模型DeepMDS整合这个扩展的靶点数据集来预测有效的药物组合。对乳腺癌细胞系MDA - MB - 231进行的实验分析证明了最佳预测的多化合物组合的有效性:组合I(表儿茶素、熊果酸、槲皮素、秦皮乙素和黄芪甲苷IV)的半数最大抑制浓度(IC)值为19.41 μM,组合指数(CI)值为0.682;组合II(表儿茶素、熊果酸、槲皮素、香草酸和黄芪甲苷IV)的IC值为23.83 μM,CI值为0.805。这些结果验证了我们的模型对训练数据集之外的新型CPI数据进行准确预测的能力,并评估了预测的可靠性,显示出在药物发现和阐明中药生物活性化合物方面具有良好的应用潜力。

结论

我们的CPI预测模型可以作为一种有用的工具,用于准确识别广泛蛋白质的潜在CPI,并有望促进药物研究、药物再利用以及支持对中药的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3e65/11408283/c262f4ddb3bc/fphar-15-1465890-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验