利用基因表达数据对泛癌细胞系药物敏感性的多基因预测因子进行系统评估。

Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data.

作者信息

Nguyen Linh, Dang Cuong C, Ballester Pedro J

机构信息

Cancer Research Center of Marseille, INSERM U1068, Marseille, France; Institut Paoli-Calmettes, Marseille, France; Aix-Marseille Université, Marseille, France; Cancer Research Center of Marseille UMR7258, Marseille, France.

出版信息

F1000Res. 2016 Dec 28;5. doi: 10.12688/f1000research.10529.2. eCollection 2016.

DOI:10.12688/f1000research.10529.2

PMID:28299173

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5310525/

Abstract

Selected gene mutations are routinely used to guide the selection of cancer drugs for a given patient tumour. Large pharmacogenomic data sets, such as those by Genomics of Drug Sensitivity in Cancer (GDSC) consortium, were introduced to discover more of these single-gene markers of drug sensitivity. Very recently, machine learning regression has been used to investigate how well cancer cell line sensitivity to drugs is predicted depending on the type of molecular profile. The latter has revealed that gene expression data is the most predictive profile in the pan-cancer setting. However, no study to date has exploited GDSC data to systematically compare the performance of machine learning models based on multi-gene expression data against that of widely-used single-gene markers based on genomics data. Here we present this systematic comparison using Random Forest (RF) classifiers exploiting the expression levels of 13,321 genes and an average of 501 tested cell lines per drug. To account for time-dependent batch effects in IC measurements, we employ independent test sets generated with more recent GDSC data than that used to train the predictors and show that this is a more realistic validation than standard k-fold cross-validation. Across 127 GDSC drugs, our results show that the single-gene markers unveiled by the MANOVA analysis tend to achieve higher precision than these RF-based multi-gene models, at the cost of generally having a poor recall (i.e. correctly detecting only a small part of the cell lines sensitive to the drug). Regarding overall classification performance, about two thirds of the drugs are better predicted by the multi-gene RF classifiers. Among the drugs with the most predictive of these models, we found pyrimethamine, sunitinib and 17-AAG. Thanks to this unbiased validation, we now know that this type of models can predict tumour response to some of these drugs. These models can thus be further investigated on tumour models. R code to facilitate the construction of alternative machine learning models and their validation in the presented benchmark is available at http://ballester.marseille.inserm.fr/gdsc.transcriptomicDatav2.tar.gz.

摘要

特定基因突变常被用于指导为特定患者肿瘤选择癌症药物。大型药物基因组数据集，如癌症药物敏感性基因组学（GDSC）联盟提供的数据集，被引入以发现更多此类药物敏感性单基因标记。最近，机器学习回归已被用于研究根据分子图谱类型预测癌细胞系对药物敏感性的效果如何。后者表明基因表达数据在泛癌背景下是最具预测性的图谱。然而，迄今为止，尚无研究利用GDSC数据系统地比较基于多基因表达数据的机器学习模型与基于基因组学数据的广泛使用的单基因标记的性能。在此，我们使用随机森林（RF）分类器进行了这种系统比较，该分类器利用13321个基因的表达水平以及每种药物平均501个测试细胞系。为了考虑IC测量中的时间依赖性批次效应，我们使用比用于训练预测器的GDSC数据更新的独立测试集，并表明这是比标准k折交叉验证更现实的验证。在127种GDSC药物中，我们的结果表明，MANOVA分析揭示的单基因标记往往比这些基于RF的多基因模型具有更高的精度，但代价通常是召回率较低（即只能正确检测出对药物敏感的细胞系中的一小部分）。关于整体分类性能，约三分之二的药物由多基因RF分类器预测效果更好。在这些模型预测性最强的药物中，我们发现了乙胺嘧啶、舒尼替尼和17-AAG。由于这种无偏验证，我们现在知道这类模型可以预测肿瘤对其中一些药物的反应。因此，可以在肿瘤模型上进一步研究这些模型。用于促进构建替代机器学习模型及其在本文基准测试中进行验证的R代码可在http://ballester.marseille.inserm.fr/gdsc.transcriptomicDatav2.tar.gz获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/749b/5357085/2ca6721c4753/f1000research-5-12007-g0000.jpg

相似文献

Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data.利用基因表达数据对泛癌细胞系药物敏感性的多基因预测因子进行系统评估。

F1000Res. 2016 Dec 28;5. doi: 10.12688/f1000research.10529.2. eCollection 2016.

Precision and recall oncology: combining multiple gene mutations for improved identification of drug-sensitive tumours.精准与召回肿瘤学：结合多种基因突变以改进对药物敏感肿瘤的识别

Oncotarget. 2017 Sep 15;8(57):97025-97040. doi: 10.18632/oncotarget.20923. eCollection 2017 Nov 14.

Improved anticancer drug response prediction in cell lines using matrix factorization with similarity regularization.使用具有相似性正则化的矩阵分解改进细胞系中抗癌药物反应预测。

BMC Cancer. 2017 Aug 2;17(1):513. doi: 10.1186/s12885-017-3500-5.

Super.FELT: supervised feature extraction learning using triplet loss for drug response prediction with multi-omics data.Super.FELT：基于三重损失的监督特征提取学习在多组学数据药物反应预测中的应用。

BMC Bioinformatics. 2021 May 25;22(1):269. doi: 10.1186/s12859-021-04146-z.

Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response.深度呼吸森林：一种用于预测抗癌药物反应的深度森林模型。

Methods. 2019 Aug 15;166:91-102. doi: 10.1016/j.ymeth.2019.02.009. Epub 2019 Feb 14.

Computational identification of multi-omic correlates of anticancer therapeutic response.抗癌治疗反应的多组学关联的计算识别。

BMC Genomics. 2014;15 Suppl 7(Suppl 7):S2. doi: 10.1186/1471-2164-15-S7-S2. Epub 2014 Oct 27.

Two-step multi-omics modelling of drug sensitivity in cancer cell lines to identify driving mechanisms.两步式多组学癌症细胞系药物敏感性建模，以鉴定驱动机制。

PLoS One. 2020 Nov 23;15(11):e0238961. doi: 10.1371/journal.pone.0238961. eCollection 2020.

Revisiting inconsistency in large pharmacogenomic studies.重新审视大型药物基因组学研究中的不一致性。

F1000Res. 2016 Sep 16;5:2333. doi: 10.12688/f1000research.9611.3. eCollection 2016.

Precision Oncology beyond Targeted Therapy: Combining Omics Data with Machine Learning Matches the Majority of Cancer Cells to Effective Therapeutics.精准肿瘤学超越靶向治疗：将组学数据与机器学习相结合，使大多数癌细胞与有效的治疗方法相匹配。

Mol Cancer Res. 2018 Feb;16(2):269-278. doi: 10.1158/1541-7786.MCR-17-0378. Epub 2017 Nov 13.

Mixture classification model based on clinical markers for breast cancer prognosis.基于临床标志物的乳腺癌预后混合分类模型。

Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.

引用本文的文献

Predicting atezolizumab response in metastatic urothelial carcinoma patients using machine learning on integrated tumour gene expression and clinical data.利用整合的肿瘤基因表达和临床数据，通过机器学习预测转移性尿路上皮癌患者对阿替利珠单抗的反应。

NPJ Precis Oncol. 2025 Jun 10;9(1):170. doi: 10.1038/s41698-025-00969-8.

Large-Scale Machine Learning Analysis Reveals DNA Methylation and Gene Expression Response Signatures for Gemcitabine-Treated Pancreatic Cancer.大规模机器学习分析揭示吉西他滨治疗胰腺癌的DNA甲基化和基因表达反应特征

Health Data Sci. 2024 Jan 8;4:0108. doi: 10.34133/hds.0108. eCollection 2024.

Drug mechanism enrichment analysis improves prioritization of therapeutics for repurposing.药物机制富集分析可提高重新定位治疗药物的优先级。

BMC Bioinformatics. 2023 May 24;24(1):215. doi: 10.1186/s12859-023-05343-8.

A Boolean-based machine learning framework identifies predictive biomarkers of HSP90-targeted therapy response in prostate cancer.一种基于布尔运算的机器学习框架可识别前列腺癌中HSP90靶向治疗反应的预测性生物标志物。

Front Mol Biosci. 2023 Jan 19;10:1094321. doi: 10.3389/fmolb.2023.1094321. eCollection 2023.

Interpretable Machine Learning Models to Predict the Resistance of Breast Cancer Patients to Doxorubicin from Their microRNA Profiles.基于 miRNA 特征预测乳腺癌患者对多柔比星耐药的可解释机器学习模型。

Adv Sci (Weinh). 2022 Aug;9(24):e2201501. doi: 10.1002/advs.202201501. Epub 2022 Jul 3.

Multiparametric High-Content Cell Painting Identifies Copper Ionophores as Selective Modulators of Esophageal Cancer Phenotypes.多参数高通量细胞染色鉴定铜离子载体为食管癌细胞表型的选择性调节剂。

ACS Chem Biol. 2022 Jul 15;17(7):1876-1889. doi: 10.1021/acschembio.2c00301. Epub 2022 Jun 13.

Predicting Cancer Drug Response In Vivo by Learning an Optimal Feature Selection of Tumour Molecular Profiles.通过学习肿瘤分子图谱的最优特征选择来预测体内癌症药物反应

Biomedicines. 2021 Sep 26;9(10):1319. doi: 10.3390/biomedicines9101319.

A Methodological Framework to Discover Pharmacogenomic Interactions Based on Random Forests.基于随机森林的药物基因组学相互作用发现的方法学框架。

Genes (Basel). 2021 Jun 18;12(6):933. doi: 10.3390/genes12060933.

Predicting tumor response to drugs based on gene-expression biomarkers of sensitivity learned from cancer cell lines.基于从癌细胞系中获取的敏感性基因表达生物标志物预测肿瘤对药物的反应。

BMC Genomics. 2021 Apr 15;22(1):272. doi: 10.1186/s12864-021-07581-7.

Impact of between-tissue differences on pan-cancer predictions of drug sensitivity.组织间差异对泛癌药物敏感性预测的影响。

PLoS Comput Biol. 2021 Feb 25;17(2):e1008720. doi: 10.1371/journal.pcbi.1008720. eCollection 2021 Feb.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用基因表达数据对泛癌细胞系药物敏感性的多基因预测因子进行系统评估。

Systematic assessment of multi-gene predictors of pan-cancer cell line sensitivity to drugs exploiting gene expression data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献