Jiawei Zhou, Min Mu, Yingru Xing, Xin Zhang, Danting Li, Yafeng Liu, Jun Xie, Wangfa Hu, Lijun Zhang, Jing Wu, Dong Hu
School of Medicine, Anhui University of Science and Technology, Huainan, China.
Key Laboratory of Industrial Dust Prevention and Control and Occupational Safety and Health, Ministry of Education, Anhui University of Science and Technology, Huainan, China.
Front Mol Biosci. 2020 Oct 27;7:561456. doi: 10.3389/fmolb.2020.561456. eCollection 2020.
The development of human tumors is associated with the abnormal expression of various functional genes, and a massive tumor-based database needs to be deeply mined. Based on a multigene prediction model, access to urgent prognosis of patients has become possible.
We selected three RNA expression profiles (GSE32863, GSE10072, and GSE43458) from the lung adenocarcinoma (LUAD) database of the Gene Expression Omnibus (GEO) and analyzed the differentially expressed genes (DEGs) between tumor and normal tissue using GEO2R program. After that, we analyzed the transcriptome data of 479 LUAD samples (54 normal tissue samples and 425 cancer tissue samples) and their clinical follow-up data from the (TCGA) database. Kaplan-Meier (KM) curve and receiver operating characteristic (ROC) were used to assess the prediction model. Multivariate Cox analysis was used to identify independent predictors. TCGA pancreatic adenocarcinoma datasets were used to establish a nomogram model.
We found 98 significantly prognosis-related genes using KM and COX analysis, among which six genes were found to be the DEGs in GEO. Using multivariate analysis, it was found that a single gene could not be used as an independent predictor of prognosis. However, the risk score calculated by weighting these six genes could serve as an independent prognosis predictor. COX analysis performed with multiple covariates such as age, gender, tumor stage, and TNM typing showed that risk score could still be utilized as an independent risk factor for patient survival rate ( = 0.013) and had an applicable reliability (area under the curve, AUC = 0.665). By combining risk score and various clinical features, the nomogram model was constructed, which had been proven to have high consistency for the prediction of 3- and 5-year survival rate (concordance = 0.751) and high accuracy as tested by ROC (AUC = 0.71;AUC = 0.708).
We proposed a method to predict the prognosis of LUAD by weighting multiple genes and constructed a nomogram model suitable for the prognostic evaluation of LUAD, which could provide a new tool for the identification of therapeutic targets and the efficacy evaluation of LUAD.
人类肿瘤的发生与多种功能基因的异常表达相关,需要深入挖掘庞大的肿瘤相关数据库。基于多基因预测模型,实现对患者预后的快速评估成为可能。
我们从基因表达综合数据库(GEO)的肺腺癌(LUAD)数据库中选取了三个RNA表达谱(GSE32863、GSE10072和GSE43458),并使用GEO2R程序分析肿瘤组织与正常组织之间的差异表达基因(DEGs)。之后,我们分析了来自癌症基因组图谱(TCGA)数据库的479例LUAD样本(54例正常组织样本和425例癌组织样本)的转录组数据及其临床随访数据。采用Kaplan-Meier(KM)曲线和受试者工作特征(ROC)曲线评估预测模型。使用多变量Cox分析确定独立预测因子。利用TCGA胰腺腺癌数据集建立列线图模型。
通过KM和Cox分析,我们发现了98个与预后显著相关的基因,其中6个基因是GEO中的DEGs。通过多变量分析发现,单个基因不能作为预后的独立预测因子。然而,通过对这6个基因进行加权计算得到的风险评分可作为独立的预后预测因子。对年龄、性别、肿瘤分期和TNM分型等多个协变量进行Cox分析表明,风险评分仍可作为患者生存率的独立危险因素(P = 0.013),且具有较高的可靠性(曲线下面积,AUC = 0.665)。通过结合风险评分和各种临床特征,构建了列线图模型,经证明该模型对3年和5年生存率的预测具有高度一致性(一致性 = 0.751),且经ROC检验具有较高的准确性(AUC = 0.71;AUC = 0.708)。
我们提出了一种通过对多个基因进行加权来预测LUAD预后的方法,并构建了适用于LUAD预后评估的列线图模型,可为LUAD治疗靶点的识别和疗效评估提供新工具。