Department of Bioinformatics, Muğla Sıtkı Koçman University, Muğla, Turkey.
Department of Molecular Biology and Genetics, Muğla Sıtkı Koçman University, Muğla, Turkey.
BMC Bioinformatics. 2020 Sep 30;21(Suppl 14):368. doi: 10.1186/s12859-020-03691-3.
Lung cancer is the leading cause of the largest number of deaths worldwide and lung adenocarcinoma is the most common form of lung cancer. In order to understand the molecular basis of lung adenocarcinoma, integrative analysis have been performed by using genomics, transcriptomics, epigenomics, proteomics and clinical data. Besides, molecular prognostic signatures have been generated for lung adenocarcinoma by using gene expression levels in tumor samples. However, we need signatures including different types of molecular data, even cohort or patient-based biomarkers which are the candidates of molecular targeting.
We built an R pipeline to carry out an integrated meta-analysis of the genomic alterations including single-nucleotide variations and the copy number variations, transcriptomics variations through RNA-seq and clinical data of patients with lung adenocarcinoma in The Cancer Genome Atlas project. We integrated significant genes including single-nucleotide variations or the copy number variations, differentially expressed genes and those in active subnetworks to construct a prognosis signature. Cox proportional hazards model with Lasso penalty and LOOCV was used to identify best gene signature among different gene categories. We determined a 12-gene signature (BCHE, CCNA1, CYP24A1, DEPTOR, MASP2, MGLL, MYO1A, PODXL2, RAPGEF3, SGK2, TNNI2, ZBTB16) for prognostic risk prediction based on overall survival time of the patients with lung adenocarcinoma. The patients in both training and test data were clustered into high-risk and low-risk groups by using risk scores of the patients calculated based on selected gene signature. The overall survival probability of these risk groups was highly significantly different for both training and test datasets.
This 12-gene signature could predict the prognostic risk of the patients with lung adenocarcinoma in TCGA and they are potential predictors for the survival-based risk clustering of the patients with lung adenocarcinoma. These genes can be used to cluster patients based on molecular nature and the best candidates of drugs for the patient clusters can be proposed. These genes also have a high potential for targeted cancer therapy of patients with lung adenocarcinoma.
肺癌是全球导致死亡人数最多的主要原因,肺腺癌是最常见的肺癌类型。为了了解肺腺癌的分子基础,已经通过基因组学、转录组学、表观基因组学、蛋白质组学和临床数据进行了综合分析。此外,还通过肿瘤样本中的基因表达水平生成了肺腺癌的分子预后标志物。然而,我们需要包括不同类型的分子数据的标志物,甚至是基于队列或患者的生物标志物,它们是分子靶向治疗的候选物。
我们构建了一个 R 管道,对包括单核苷酸变异和拷贝数变异在内的基因组改变、通过 RNA-seq 进行的转录组学变化以及癌症基因组图谱项目中肺腺癌患者的临床数据进行综合元分析。我们整合了显著的基因,包括单核苷酸变异或拷贝数变异、差异表达基因和活跃子网络中的基因,构建了一个预后标志物。Cox 比例风险模型与 Lasso 惩罚和 LOOCV 一起用于从不同基因类别中识别最佳基因标志物。我们确定了一个 12 基因标志物(BCHE、CCNA1、CYP24A1、DEPTOR、MASP2、MGLL、MYO1A、PODXL2、RAPGEF3、SGK2、TNNI2、ZBTB16),用于基于肺腺癌患者总生存时间的预后风险预测。根据所选基因标志物计算的患者风险评分,将训练和测试数据中的患者聚类为高风险和低风险组。对于这两个数据集,这些风险组的总生存概率差异具有统计学意义。
这个 12 基因标志物可以预测 TCGA 中肺腺癌患者的预后风险,它们是肺腺癌患者基于生存的风险聚类的潜在预测因子。这些基因可用于基于分子特性对患者进行聚类,并提出针对患者聚类的最佳药物候选物。这些基因还为肺腺癌患者的靶向癌症治疗提供了很高的潜力。