Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China.
PLoS One. 2012;7(4):e33393. doi: 10.1371/journal.pone.0033393. Epub 2012 Apr 4.
One of the most important and challenging problems in biomedicine and genomics is how to identify the disease genes. In this study, we developed a computational method to identify colorectal cancer-related genes based on (i) the gene expression profiles, and (ii) the shortest path analysis of functional protein association networks. The former has been used to select differentially expressed genes as disease genes for quite a long time, while the latter has been widely used to study the mechanism of diseases. With the existing protein-protein interaction data from STRING (Search Tool for the Retrieval of Interacting Genes), a weighted functional protein association network was constructed. By means of the mRMR (Maximum Relevance Minimum Redundancy) approach, six genes were identified that can distinguish the colorectal tumors and normal adjacent colonic tissues from their gene expression profiles. Meanwhile, according to the shortest path approach, we further found an additional 35 genes, of which some have been reported to be relevant to colorectal cancer and some are very likely to be relevant to it. Interestingly, the genes we identified from both the gene expression profiles and the functional protein association network have more cancer genes than the genes identified from the gene expression profiles alone. Besides, these genes also had greater functional similarity with the reported colorectal cancer genes than the genes identified from the gene expression profiles alone. All these indicate that our method as presented in this paper is quite promising. The method may become a useful tool, or at least plays a complementary role to the existing method, for identifying colorectal cancer genes. It has not escaped our notice that the method can be applied to identify the genes of other diseases as well.
生物医学和基因组学中最重要和最具挑战性的问题之一是如何识别疾病基因。在这项研究中,我们开发了一种基于(i)基因表达谱和(ii)功能蛋白关联网络最短路径分析的计算方法来识别结直肠癌相关基因。前者长期以来一直被用于选择差异表达基因作为疾病基因,而后者被广泛用于研究疾病的机制。利用 STRING(检索相互作用基因的工具)现有的蛋白质-蛋白质相互作用数据,构建了一个加权功能蛋白质关联网络。通过 mRMR(最大相关性最小冗余)方法,从基因表达谱中鉴定出 6 个基因,可以区分结直肠肿瘤和正常相邻结肠组织。同时,根据最短路径方法,我们进一步发现了另外 35 个基因,其中一些已被报道与结直肠癌相关,而另一些很可能与结直肠癌相关。有趣的是,我们从基因表达谱和功能蛋白关联网络中识别出的基因比仅从基因表达谱中识别出的基因具有更多的癌症基因。此外,这些基因与报道的结直肠癌基因的功能相似性也比仅从基因表达谱中识别出的基因更高。所有这些都表明,我们在本文中提出的方法非常有前途。该方法可能成为一种有用的工具,或者至少可以作为现有方法的补充,用于识别结直肠癌基因。我们注意到,该方法也可以应用于识别其他疾病的基因。