The Cancer Hospital of the University of Chinese Academy of Sciences (Zhejiang Cancer Hospital), Institute of Cancer and Basic Medicine (ICBM), Chinese Academy of Sciences, Beijing, China.
Department of Breast Surgery, Cancer Hospital of the University of Chinese Academy of Sciences, Beijing, China.
PLoS One. 2020 Nov 10;15(11):e0241924. doi: 10.1371/journal.pone.0241924. eCollection 2020.
To identify a gene signature for the prognosis of breast cancer using high-throughput analysis.
RNASeq, single nucleotide polymorphism (SNP), copy number variation (CNV) data and clinical follow-up information were downloaded from The Cancer Genome Atlas (TCGA), and randomly divided into training set or verification set. Genes related to breast cancer prognosis and differentially expressed genes (DEGs) with CNV or SNP were screened from training set, then integrated together for feature selection of identify robust biomarkers using RandomForest. Finally, a gene-related prognostic model was established and its performance was verified in TCGA test set, Gene Expression Omnibus (GEO) validation set and breast cancer subtypes.
A total of 2287 prognosis-related genes, 131 genes with amplified copy numbers, 724 gens with copy number deletions, and 280 genes with significant mutations screened from Genomic Variants were closely correlated with the development of breast cancer. A total of 120 candidate genes were obtained by integrating genes from Genomic Variants and those related to prognosis, then 6 characteristic genes (CD24, PRRG1, IQSEC3, MRGPRX, RCC2, and CASP8) were top-ranked by RandomForest for feature selection, noticeably, several of these have been previously reported to be associated with the progression of breast cancer. Cox regression analysis was performed to establish a 6-gene signature, which can stratify the risk of samples from training set, test set and external validation set, moreover, the five-year survival AUC of the model in the training set and validation set was both higher than 0.65. Thus, the 6-gene signature developed in the current study could serve as an independent prognostic factor for breast cancer patients.
This study constructed a 6-gene signature as a novel prognostic marker for predicting the survival of breast cancer patients, providing new diagnostic/prognostic biomarkers and therapeutic targets for breast cancer patients.
利用高通量分析鉴定乳腺癌预后的基因特征。
从癌症基因组图谱(TCGA)下载 RNA-seq、单核苷酸多态性(SNP)、拷贝数变异(CNV)数据和临床随访信息,并将其随机分为训练集或验证集。从训练集中筛选与乳腺癌预后相关的基因和具有 CNV 或 SNP 的差异表达基因(DEGs),然后将它们整合在一起,使用随机森林进行特征选择,以确定稳健的生物标志物。最后,在 TCGA 测试集、基因表达综合数据库(GEO)验证集和乳腺癌亚型中建立基因相关预后模型并验证其性能。
从基因组变异中筛选出 2287 个与预后相关的基因、131 个扩增拷贝数的基因、724 个拷贝数缺失的基因和 280 个有显著突变的基因,这些基因与乳腺癌的发生发展密切相关。通过整合基因组变异和预后相关基因的基因,共获得 120 个候选基因,然后使用随机森林进行特征选择,排名前 6 的特征基因(CD24、PRRG1、IQSEC3、MRGPRX、RCC2 和 CASP8),其中一些已被报道与乳腺癌的进展有关。进行 Cox 回归分析建立 6 基因标志物,可对训练集、测试集和外部验证集的样本进行风险分层,并且模型在训练集和验证集中的 5 年生存率 AUC 均高于 0.65。因此,本研究中开发的 6 基因标志物可以作为乳腺癌患者的独立预后因素。
本研究构建了一个 6 基因标志物作为预测乳腺癌患者生存的新型预后标志物,为乳腺癌患者提供了新的诊断/预后生物标志物和治疗靶点。