一种用于预测三阴性乳腺癌免疫亚型的机器学习模型。

A Machine Learning Model to Predict the Triple Negative Breast Cancer Immune Subtype.

机构信息

Department of Urology, University of Freiburg, Freiburg, Germany.

Department of Breast Surgery, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, China.

出版信息

Front Immunol. 2021 Sep 17;12:749459. doi: 10.3389/fimmu.2021.749459. eCollection 2021.

Abstract

BACKGROUND

Immune checkpoint blockade (ICB) has been approved for the treatment of triple-negative breast cancer (TNBC), since it significantly improved the progression-free survival (PFS). However, only about 10% of TNBC patients could achieve the complete response (CR) to ICB because of the low response rate and potential adverse reactions to ICB.

METHODS

Open datasets from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) were downloaded to perform an unsupervised clustering analysis to identify the immune subtype according to the expression profiles. The prognosis, enriched pathways, and the ICB indicators were compared between immune subtypes. Afterward, samples from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset were used to validate the correlation of immune subtype with prognosis. Data from patients who received ICB were selected to validate the correlation of the immune subtype with ICB response. Machine learning models were used to build a visual web server to predict the immune subtype of TNBC patients requiring ICB.

RESULTS

A total of eight open datasets including 931 TNBC samples were used for the unsupervised clustering. Two novel immune subtypes (referred to as S1 and S2) were identified among TNBC patients. Compared with S2, S1 was associated with higher immune scores, higher levels of immune cells, and a better prognosis for immunotherapy. In the validation dataset, subtype 1 samples had a better prognosis than sub type 2 samples, no matter in overall survival (OS) (p = 0.00036) or relapse-free survival (RFS) (p = 0.0022). Bioinformatics analysis identified 11 hub genes (LCK, IL2RG, CD3G, STAT1, CD247, IL2RB, CD3D, IRF1, OAS2, IRF4, and IFNG) related to the immune subtype. A robust machine learning model based on random forest algorithm was established by 11 hub genes, and it performed reasonably well with area Under the Curve of the receiver operating characteristic (AUC) values = 0.76. An open and free web server based on the random forest model, named as triple-negative breast cancer immune subtype (TNBCIS), was developed and is available from https://immunotypes.shinyapps.io/TNBCIS/.

CONCLUSION

TNBC open datasets allowed us to stratify samples into distinct immunotherapy response subgroups according to gene expression profiles. Based on two novel subtypes, candidates for ICB with a higher response rate and better prognosis could be selected by using the free visual online web server that we designed.

摘要

背景

免疫检查点阻断(ICB)已被批准用于治疗三阴性乳腺癌(TNBC),因为它显著改善了无进展生存期(PFS)。然而,由于反应率低和潜在的 ICB 不良反应,只有约 10%的 TNBC 患者能对 ICB 产生完全反应(CR)。

方法

从癌症基因组图谱(TCGA)和基因表达综合数据库(GEO)下载公开数据集,进行无监督聚类分析,根据表达谱鉴定免疫亚型。比较免疫亚型之间的预后、富集途径和 ICB 指标。然后,使用来自乳腺癌国际分子分类联盟(METABRIC)数据集的样本验证免疫亚型与预后的相关性。选择接受 ICB 的患者的数据来验证免疫亚型与 ICB 反应的相关性。使用机器学习模型构建可视化网络服务器,以预测需要 ICB 的 TNBC 患者的免疫亚型。

结果

共使用 8 个包含 931 例 TNBC 样本的开放数据集进行无监督聚类。在 TNBC 患者中鉴定出两种新的免疫亚型(称为 S1 和 S2)。与 S2 相比,S1 与更高的免疫评分、更高水平的免疫细胞以及更好的免疫治疗预后相关。在验证数据集,1 型样本的总生存期(OS)(p = 0.00036)和无复发生存期(RFS)(p = 0.0022)均优于 2 型样本。生物信息学分析鉴定出 11 个与免疫亚型相关的枢纽基因(LCK、IL2RG、CD3G、STAT1、CD247、IL2RB、CD3D、IRF1、OAS2、IRF4 和 IFNG)。基于随机森林算法建立了一个稳健的基于 11 个枢纽基因的机器学习模型,其受试者工作特征曲线下的面积(AUC)值= 0.76,性能良好。我们基于随机森林模型开发了一个免费的开放网络服务器,名为三阴性乳腺癌免疫亚型(TNBCIS),可从 https://immunotypes.shinyapps.io/TNBCIS/ 访问。

结论

TNBC 开放数据集允许我们根据基因表达谱将样本分层为不同的免疫治疗反应亚组。基于两个新的亚型,可以使用我们设计的免费可视化在线网络服务器选择具有更高反应率和更好预后的 ICB 候选者。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/671b/8484710/34e69e9294d6/fimmu-12-749459-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索