School of Computer Science and Technology, Xidian University, Xi'an 710071, China.
Biomed Res Int. 2013;2013:401649. doi: 10.1155/2013/401649. Epub 2013 Sep 1.
Great efforts have been devoted to alleviate uncertainty of detected cancer genes as accurate identification of oncogenes is of tremendous significance and helps unravel the biological behavior of tumors. In this paper, we present a differential network-based framework to detect biologically meaningful cancer-related genes. Firstly, a gene regulatory network construction algorithm is proposed, in which a boosting regression based on likelihood score and informative prior is employed for improving accuracy of identification. Secondly, with the algorithm, two gene regulatory networks are constructed from case and control samples independently. Thirdly, by subtracting the two networks, a differential-network model is obtained and then used to rank differentially expressed hub genes for identification of cancer biomarkers. Compared with two existing gene-based methods (t-test and lasso), the method has a significant improvement in accuracy both on synthetic datasets and two real breast cancer datasets. Furthermore, identified six genes (TSPYL5, CD55, CCNE2, DCK, BBC3, and MUC1) susceptible to breast cancer were verified through the literature mining, GO analysis, and pathway functional enrichment analysis. Among these oncogenes, TSPYL5 and CCNE2 have been already known as prognostic biomarkers in breast cancer, CD55 has been suspected of playing an important role in breast cancer prognosis from literature evidence, and other three genes are newly discovered breast cancer biomarkers. More generally, the differential-network schema can be extended to other complex diseases for detection of disease associated-genes.
人们付出了巨大努力来减轻检测到的癌症基因的不确定性,因为准确识别致癌基因具有重要意义,可以帮助揭示肿瘤的生物学行为。在本文中,我们提出了一种基于差异网络的框架来检测具有生物学意义的癌症相关基因。首先,提出了一种基因调控网络构建算法,该算法使用基于似然评分和信息先验的提升回归来提高识别的准确性。其次,使用该算法从病例和对照样本中分别构建了两个基因调控网络。第三,通过减去这两个网络,获得了一个差异网络模型,并将其用于对差异表达的枢纽基因进行排名,以识别癌症生物标志物。与两种现有的基于基因的方法(t 检验和 lasso)相比,该方法在合成数据集和两个真实乳腺癌数据集上的准确性都有显著提高。此外,通过文献挖掘、GO 分析和通路功能富集分析,验证了六个易患乳腺癌的基因(TSPYL5、CD55、CCNE2、DCK、BBC3 和 MUC1)。其中,TSPYL5 和 CCNE2 已被确认为乳腺癌的预后生物标志物,CD55 已被文献证据怀疑在乳腺癌预后中发挥重要作用,其他三个基因是新发现的乳腺癌生物标志物。更一般地说,差异网络方案可以扩展到其他复杂疾病,以检测与疾病相关的基因。