Kaur Harpreet, Dhall Anjali, Kumar Rajesh, Raghava Gajendra P S
Bioinformatics Center, CSIR-Institute of Microbial Technology, Chandigarh, India.
Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.
Front Genet. 2020 Jan 10;10:1306. doi: 10.3389/fgene.2019.01306. eCollection 2019.
The high mortality rate of hepatocellular carcinoma (HCC) is primarily due to its late diagnosis. In the past, numerous attempts have been made to design genetic biomarkers for the identification of HCC; unfortunately, most of the studies are based on small datasets obtained from a specific platform or lack reasonable validation performance on the external datasets. In order to identify a universal expression-based diagnostic biomarker panel for HCC that can be applicable across multiple platforms, we have employed large-scale transcriptomic profiling datasets containing a total of 2,316 HCC and 1,665 non-tumorous tissue samples. These samples were obtained from 30 studies generated by mainly four types of profiling techniques (Affymetrix, Illumina, Agilent, and High-throughput sequencing), which are implemented in a wide range of platforms. Firstly, we scrutinized overlapping 26 genes that are differentially expressed in numerous datasets. Subsequently, we identified a panel of three genes (, and as HCC biomarker using different feature selection techniques. Three-genes-based HCC biomarker identified HCC samples in training/validation datasets with an accuracy between 93 and 98%, Area Under Receiver Operating Characteristic curve (AUROC) in a range of 0.97 to 1.0. A reasonable performance, i.e., AUROC 0.91-0.96 achieved on validation dataset containing peripheral blood mononuclear cells, concurred their non-invasive utility. Furthermore, the prognostic potential of these genes was evaluated on TCGA-LIHC and GSE14520 cohorts using univariate survival analysis. This analysis revealed that these genes are prognostic indicators for various types of the survivals of HCC patients (e.g., Overall Survival, Progression-Free Survival, Disease-Free Survival). These genes significantly stratified high-risk and low-risk HCC patients (p-value <0.05). In conclusion, we identified a universal platform-independent three-genes-based biomarker that can predict HCC patients with high precision and also possess significant prognostic potential. Eventually, we developed a web server HCCpred based on the above study to facilitate scientific community (http://webs.iiitd.edu.in/raghava/hccpred/).
肝细胞癌(HCC)的高死亡率主要归因于其诊断较晚。过去,人们多次尝试设计用于识别HCC的基因生物标志物;不幸的是,大多数研究基于从特定平台获得的小数据集,或者在外部数据集上缺乏合理的验证性能。为了识别一种适用于多个平台的基于通用表达的HCC诊断生物标志物面板,我们采用了大规模转录组分析数据集,其中总共包含2316个HCC和1665个非肿瘤组织样本。这些样本来自30项主要由四种分析技术(Affymetrix、Illumina、Agilent和高通量测序)生成的研究,这些技术在广泛的平台上实施。首先,我们仔细研究了在众多数据集中差异表达的26个重叠基因。随后,我们使用不同的特征选择技术鉴定了一组三个基因(、和)作为HCC生物标志物。基于三个基因的HCC生物标志物在训练/验证数据集中识别HCC样本的准确率在93%至98%之间,受试者工作特征曲线下面积(AUROC)在0.97至1.0范围内。在包含外周血单核细胞的验证数据集上实现了合理的性能,即AUROC为0.91 - 0.96,这证实了它们的非侵入性用途。此外,使用单变量生存分析在TCGA - LIHC和GSE14520队列中评估了这些基因的预后潜力。该分析表明,这些基因是HCC患者各种类型生存(例如总生存、无进展生存、无病生存)的预后指标。这些基因显著区分了高风险和低风险的HCC患者(p值<0.05)。总之,我们鉴定了一种通用的基于三个基因的与平台无关的生物标志物,它可以高精度地预测HCC患者,并且还具有显著的预后潜力。最终,我们基于上述研究开发了一个网络服务器HCCpred,以方便科学界使用(http://webs.iiitd.edu.in/raghava/hccpred/)。