Suppr超能文献

基于CatBoost的抗癌药物协同作用预测

Anticancer drug synergy prediction based on CatBoost.

作者信息

Li Changheng, Guan Nana, Zhang Hongyi

机构信息

College of Big Data Statistics, Guizhou University of Finance and Economics, Guiyang, China.

出版信息

PeerJ Comput Sci. 2025 May 19;11:e2829. doi: 10.7717/peerj-cs.2829. eCollection 2025.

Abstract

BACKGROUND

The research of cancer treatments has always been a hot topic in the medical field. Multi-targeted combination drugs have been considered as an ideal option for cancer treatment. Since it is not feasible to use clinical experience or high-throughput screening to identify the complete combinatorial space, methods such as machine learning models offer the possibility to explore the combinatorial space effectively.

METHODS

In this work, we proposed a machine learning method based on CatBoost to predict the synergy scores of anticancer drug combinations on cancer cell lines, which utilized oblivious trees and ordered boosting technique to avoid overfitting and bias. The model was trained and tested using the data screened from NCI-ALMANAC dataset. The drugs were characterized with morgan fingerprints, drug target information, monotherapy information, and the cell lines were described with gene expression profiles.

RESULTS

In the stratified 5-fold cross-validation, our method obtained excellent results, where, the receiver operating characteristic area under the curve (ROC AUC) is 0.9217, precision-recall area under the curve (PR AUC) is 0.4651, mean squared error (MSE) is 0.1365, and Pearson correlation coefficient is 0.5335. The performance is significantly better than three other advanced models. Additionally, when using SHapley Additive exPlanations (SHAP) to interpret the biological significance of the prediction results, we found that drug features played more prominent roles than cell line features, and genes associated with cancer development, such as PTK2, CCND1, and GNA11, played an important part in drug synergy prediction. Combining the experimental results, the model proposed in this study has a good prediction effect and can be used as an alternative method for predicting anticancer drug combinations.

摘要

背景

癌症治疗的研究一直是医学领域的热门话题。多靶点联合药物被认为是癌症治疗的理想选择。由于利用临床经验或高通量筛选来确定完整的组合空间是不可行的,机器学习模型等方法为有效探索组合空间提供了可能性。

方法

在这项工作中,我们提出了一种基于CatBoost的机器学习方法来预测抗癌药物组合在癌细胞系上的协同得分,该方法利用了 oblivious 树和有序提升技术来避免过拟合和偏差。使用从NCI - ALMANAC数据集中筛选的数据对模型进行训练和测试。药物用摩根指纹、药物靶点信息、单药治疗信息进行表征,细胞系用基因表达谱进行描述。

结果

在分层5折交叉验证中,我们的方法取得了优异的结果,其中,曲线下面积(ROC AUC)为0.9217,精确率 - 召回率曲线下面积(PR AUC)为0.4651,均方误差(MSE)为0.1365,皮尔逊相关系数为0.5335。性能显著优于其他三个先进模型。此外,当使用SHapley Additive exPlanations(SHAP)来解释预测结果的生物学意义时,我们发现药物特征比细胞系特征发挥了更突出的作用,并且与癌症发展相关的基因,如PTK2、CCND1和GNA11,在药物协同预测中发挥了重要作用。结合实验结果,本研究提出的模型具有良好的预测效果,可作为预测抗癌药物组合的替代方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9246/12190655/cdbc383d1822/peerj-cs-11-2829-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验