Bioinformatics Centre, CSIR-Institute of Microbial Technology, Sector 39A, Chandigarh-160036, India.
CSIR-Central Scientific Instruments Organization, Sector 30C, Chandigarh-160030, India.
Sci Rep. 2017 Mar 28;7:44997. doi: 10.1038/srep44997.
In this study, an attempt has been made to identify expression-based gene biomarkers that can discriminate early and late stage of clear cell renal cell carcinoma (ccRCC) patients. We have analyzed the gene expression of 523 samples to identify genes that are differentially expressed in the early and late stage of ccRCC. First, a threshold-based method has been developed, which attained a maximum accuracy of 71.12% with ROC 0.67 using single gene NR3C2. To improve the performance of threshold-based method, we combined two or more genes and achieved maximum accuracy of 70.19% with ROC of 0.74 using eight genes on the validation dataset. These eight genes include four underexpressed (NR3C2, ENAM, DNASE1L3, FRMPD2) and four overexpressed (PLEKHA9, MAP6D1, SMPD4, C11orf73) genes in the late stage of ccRCC. Second, models were developed using state-of-art techniques and achieved maximum accuracy of 72.64% and 0.81 ROC using 64 genes on validation dataset. Similar accuracy was obtained on 38 genes selected from subset of genes, involved in cancer hallmark biological processes. Our analysis further implied a need to develop gender-specific models for stage classification. A web server, CancerCSP, has been developed to predict stage of ccRCC using gene expression data derived from RNAseq experiments.
在这项研究中,我们试图确定基于表达的基因生物标志物,以区分透明细胞肾细胞癌(ccRCC)患者的早期和晚期。我们分析了 523 个样本的基因表达,以鉴定在 ccRCC 早期和晚期差异表达的基因。首先,我们开发了一种基于阈值的方法,该方法使用单个基因 NR3C2 获得了最大准确率为 71.12%,ROC 为 0.67。为了提高基于阈值的方法的性能,我们结合了两个或更多的基因,并在验证数据集上使用八个基因获得了最大准确率为 70.19%,ROC 为 0.74。这八个基因包括四个在 ccRCC 晚期表达下调(NR3C2、ENAM、DNASE1L3、FRMPD2)和四个表达上调(PLEKHA9、MAP6D1、SMPD4、C11orf73)的基因。其次,我们使用最先进的技术开发了模型,在验证数据集上使用 64 个基因获得了最大准确率为 72.64%和 0.81 ROC。在从涉及癌症标志性生物过程的基因子集选择的 38 个基因上也获得了相似的准确性。我们的分析进一步表明需要开发用于分期分类的性别特异性模型。我们开发了一个名为 CancerCSP 的网络服务器,用于使用来自 RNAseq 实验的基因表达数据预测 ccRCC 的分期。