Mao Shihong, Dong Guozhu
Department of Computer Science and Engineering, Wright State University, USA.
J Bioinform Comput Biol. 2005 Dec;3(6):1263-80. doi: 10.1142/s0219720005001545.
It is commonly believed that suitable analysis of microarray gene expression profile data can lead to better understanding of diseases, and better ways to diagnose and treat diseases. To achieve those goals, it is of interest to discover the gene interaction networks, and perhaps even pathways, underlying given diseases from such data. In this paper, we consider methods for efficiently discovering highly differentiative gene groups (HDGG), which may provide insights on gene interaction networks. HDGGs are groups of genes which completely or nearly completely characterize the diseased or normal tissues. Discovering HDGGs is challenging, due to the high dimensionality of the data.
Our methods are based on the novel concept of gene clubs. A gene club consists of a set of genes having high potential to be interactive with each other. The methods can (i) efficiently discover signature HDGGs which completely characterize the diseased and the normal tissues respectively, (ii) find strongest or near strongest HDGGs containing any given gene, and (iii) find much stronger HDGGs than previous methods. As part of the experimental evaluation, the methods are applied to colon, prostate, ovarian, and breast cancer, and leukemia and so on. Some of the genes in the extracted signature HDGGs have known biological functions, and some have attracted little attention in biology and medicine. We hope that appropriate study on them can lead to medical breakthroughs. Some HDGGs for colon and prostate cancers are listed here. The website listed below contains HDGGs for the other cancers.
HDGG is implemented in C++ and runs on Unix or Windows platform. The code is available at: http://www.cs.wright.edu/~gdong/hdgg/.
人们普遍认为,对微阵列基因表达谱数据进行适当分析有助于更好地理解疾病,以及找到更好的疾病诊断和治疗方法。为实现这些目标,从此类数据中发现特定疾病背后的基因相互作用网络甚至可能的通路很有意义。在本文中,我们考虑了有效发现高差异基因组(HDGG)的方法,这些基因组可能为基因相互作用网络提供见解。HDGG是能完全或几乎完全表征患病或正常组织的基因组。由于数据的高维度性,发现HDGG具有挑战性。
我们的方法基于基因俱乐部这一新颖概念。基因俱乐部由一组具有高度相互作用潜力的基因组成。这些方法能够:(i)有效发现分别完全表征患病和正常组织的特征性HDGG;(ii)找到包含任何给定基因的最强或接近最强的HDGG;(iii)找到比以前方法更强得多的HDGG。作为实验评估的一部分,这些方法被应用于结肠癌、前列腺癌、卵巢癌、乳腺癌以及白血病等。提取的特征性HDGG中的一些基因具有已知的生物学功能,而一些在生物学和医学领域很少受到关注。我们希望对它们进行适当研究能带来医学突破。这里列出了一些结肠癌和前列腺癌的HDGG。下面列出的网站包含其他癌症的HDGG。
HDGG用C++实现,可在Unix或Windows平台上运行。代码可从以下网址获取:http://www.cs.wright.edu/~gdong/hdgg/ 。