Fa Botao, Luo Chengwen, Tang Zhou, Yan Yuting, Zhang Yue, Yu Zhangsheng
Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China; SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China.
SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China.
EBioMedicine. 2019 Jun;44:250-260. doi: 10.1016/j.ebiom.2019.05.010. Epub 2019 May 14.
Although many prognostic single-gene (SG) lists have been identified in cancer research, application of these features is hampered due to poor robustness and performance on independent datasets. Pathway-based approaches have thus emerged which embed biological knowledge to yield reproducible features.
Pathifier estimates pathways deregulation score (PDS) to represent the extent of pathway deregulation based on expression data, and most of its applications treat pathways as independent without addressing the effect of gene overlap between pathway pairs which we refer to as crosstalk. Here, we propose a novel procedure based on Pathifier methodology, which for the first time has been utilized with crosstalk accommodated to identify disease-specific features to predict prognosis in patients with hepatocellular carcinoma (HCC).
With the cohort (N = 355) of HCC patients from The Cancer Genome Atlas (TCGA), cross validation (CV) revealed that PDSs identified were more robust and accurate than the SG features by deep learning (DL)-based approach. When validated on external HCC datasets, these features outperformed the SGs consistently.
On average, we provide 10.2% improvement of prediction accuracy. Importantly, governing genes in these features provide valuable insight into the cancer hallmarks of HCC. We develop an R package PATHcrosstalk (available from GitHub https://github.com/fabotao/PATHcrosstalk) with which users can discover pathways of interest with crosstalk effect considered.
尽管在癌症研究中已经确定了许多预后单基因(SG)列表,但由于这些特征在独立数据集上的稳健性和性能较差,其应用受到了阻碍。基于通路的方法因此应运而生,该方法嵌入生物学知识以产生可重复的特征。
Pathifier基于表达数据估计通路失调分数(PDS)以表示通路失调的程度,并且其大多数应用将通路视为独立的,而未考虑通路对之间基因重叠的影响,我们将其称为串扰。在此,我们提出了一种基于Pathifier方法的新程序,该程序首次在考虑串扰的情况下用于识别疾病特异性特征,以预测肝细胞癌(HCC)患者的预后。
对于来自癌症基因组图谱(TCGA)的HCC患者队列(N = 355),交叉验证(CV)表明,通过基于深度学习(DL)的方法,所识别的PDS比SG特征更稳健、更准确。在外部HCC数据集上进行验证时,这些特征始终优于SG。
平均而言,我们将预测准确率提高了10.2%。重要的是,这些特征中的调控基因提供了对HCC癌症特征有价值的见解。我们开发了一个R包PATHcrosstalk(可从GitHub https://github.com/fabotao/PATHcrosstalk获得),用户可以使用该包在考虑串扰效应的情况下发现感兴趣的通路。