Computer Engineering Department, Dokuz Eylul Universitesi, 35160, Izmir, Turkey.
Computer Engineering Department, Dokuz Eylul Universitesi, 35160, Izmir, Turkey.
Comput Biol Med. 2017 Oct 1;89:397-404. doi: 10.1016/j.compbiomed.2017.08.028. Epub 2017 Aug 26.
Integration of several types of patient data in a computational framework can accelerate the identification of more reliable biomarkers, especially for prognostic purposes. This study aims to identify biomarkers that can successfully predict the potential survival time of a cancer patient by integrating the transcriptomic (RNA-Seq), proteomic (RPPA), and protein-protein interaction (PPI) data. The proposed method -RPBioNet- employs a random walk-based algorithm that works on a PPI network to identify a limited number of protein biomarkers. Later, the method uses gene expression measurements of the selected biomarkers to train a classifier for the survival time prediction of patients. RPBioNet was applied to classify kidney renal clear cell carcinoma (KIRC), glioblastoma multiforme (GBM), and lung squamous cell carcinoma (LUSC) patients based on their survival time classes (long- or short-term). The RPBioNet method correctly identified the survival time classes of patients with between 66% and 78% average accuracy for three data sets. RPBioNet operates with only 20 to 50 biomarkers and can achieve on average 6% higher accuracy compared to the closest alternative method, which uses only RNA-Seq data in the biomarker selection. Further analysis of the most predictive biomarkers highlighted genes that are common for both cancer types, as they may be driver proteins responsible for cancer progression. The novelty of this study is the integration of a PPI network with mRNA and protein expression data to identify more accurate prognostic biomarkers that can be used for clinical purposes in the future.
将多种类型的患者数据整合到计算框架中可以加速更可靠的生物标志物的识别,特别是在预测预后方面。本研究旨在通过整合转录组学(RNA-Seq)、蛋白质组学(RPPA)和蛋白质-蛋白质相互作用(PPI)数据,识别能够成功预测癌症患者潜在生存时间的生物标志物。所提出的方法 -RPBioNet- 采用基于随机游走的算法,该算法作用于 PPI 网络,以识别有限数量的蛋白质生物标志物。然后,该方法使用所选生物标志物的基因表达测量值来训练用于患者生存时间预测的分类器。RPBioNet 应用于基于生存时间类别(长或短)对肾透明细胞癌(KIRC)、胶质母细胞瘤(GBM)和肺鳞状细胞癌(LUSC)患者进行分类。RPBioNet 方法正确识别了患者的生存时间类别,在三个数据集上的平均准确率在 66%至 78%之间。RPBioNet 仅使用 20 到 50 个生物标志物进行操作,与仅在生物标志物选择中使用 RNA-Seq 数据的最接近的替代方法相比,平均准确率可提高 6%。对最具预测性的生物标志物的进一步分析突出了两种癌症类型共有的基因,因为它们可能是导致癌症进展的驱动蛋白。本研究的新颖之处在于将 PPI 网络与 mRNA 和蛋白质表达数据集成,以识别更准确的预后生物标志物,这些生物标志物可在未来用于临床目的。