Yang Hong, Zhang Xin, Cai Xiao-Yong, Wen Dong-Yue, Ye Zhi-Hua, Liang Liang, Zhang Lu, Wang Han-Lin, Chen Gang, Feng Zhen-Bo
Department of Ultrasonography, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China.
Department of Pathology, First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China.
PeerJ. 2017 Mar 14;5:e3089. doi: 10.7717/peerj.3089. eCollection 2017.
Liver hepatocellular carcinoma accounts for the overwhelming majority of primary liver cancers and its belated diagnosis and poor prognosis call for novel biomarkers to be discovered, which, in the era of big data, innovative bioinformatics and computational techniques can prove to be highly helpful in.
Big data aggregated from The Cancer Genome Atlas and Natural Language Processing were integrated to generate differentially expressed genes. Relevant signaling pathways of differentially expressed genes went through Gene Ontology enrichment analysis, Kyoto Encyclopedia of Genes and Genomes and Panther pathway enrichment analysis and protein-protein interaction network. The pathway ranked high in the enrichment analysis was further investigated, and selected genes with top priority were evaluated and assessed in terms of their diagnostic and prognostic values.
A list of 389 genes was generated by overlapping genes from The Cancer Genome Atlas and Natural Language Processing. Three pathways demonstrated top priorities, and the one with specific associations with cancers, 'pathways in cancer,' was analyzed with its four highlighted genes, namely, BIRC5, E2F1, CCNE1, and CDKN2A, which were validated using Oncomine. The detection pool composed of the four genes presented satisfactory diagnostic power with an outstanding integrated AUC of 0.990 (95% CI [0.982-0.998], < 0.001, sensitivity: 96.0%, specificity: 96.5%). BIRC5 ( = 0.021) and CCNE1 ( = 0.027) were associated with poor prognosis, while CDKN2A ( = 0.066) and E2F1 ( = 0.088) demonstrated no statistically significant differences.
The study illustrates liver hepatocellular carcinoma gene signatures, related pathways and networks from the perspective of big data, featuring the cancer-specific pathway with priority, 'pathways in cancer.' The detection pool of the four highlighted genes, namely BIRC5, E2F1, CCNE1 and CDKN2A, should be further investigated given its high evidence level of diagnosis, whereas the prognostic powers of BIRC5 and CCNE1 are equally attractive and worthy of attention.
肝细胞癌占原发性肝癌的绝大多数,其诊断延迟和预后不良促使人们发现新的生物标志物,在大数据时代,创新的生物信息学和计算技术可能对此有很大帮助。
整合从癌症基因组图谱和自然语言处理中汇总的大数据,以生成差异表达基因。对差异表达基因的相关信号通路进行基因本体富集分析、京都基因与基因组百科全书和泛素途径富集分析以及蛋白质-蛋白质相互作用网络分析。对富集分析中排名靠前的通路进行进一步研究,并对优先级最高的选定基因的诊断和预后价值进行评估。
通过重叠癌症基因组图谱和自然语言处理中的基因,生成了一个包含389个基因的列表。有三条通路显示出最高优先级,其中与癌症有特定关联的“癌症通路”,对其四个突出基因BIRC5、E2F1、CCNE1和CDKN2A进行了分析,并使用Oncomine进行了验证。由这四个基因组成的检测组具有令人满意的诊断能力,综合AUC出色,为0.990(95%CI[0.982-0.998],<0.001,敏感性:96.0%,特异性:96.5%)。BIRC5(=0.021)和CCNE1(=0.027)与预后不良相关,而CDKN2A(=0.066)和E2F1(=0.088)无统计学显著差异。
本研究从大数据角度阐述了肝细胞癌的基因特征、相关通路和网络,突出了具有优先级的癌症特异性通路“癌症通路”。鉴于其高诊断证据水平,应进一步研究由四个突出基因BIRC5、E2F1、CCNE1和CDKN2A组成的检测组,而BIRC5和CCNE1的预后能力同样具有吸引力,值得关注。