Teng Li, Chan Laiwan
Room 1013, HSB Engineering Building, The Chinese University of Hong Kong, NT, Hong Kong.
J Integr Bioinform. 2008 Aug 25;5(2):105. doi: 10.2390/biecoll-jib-2008-105.
Traditional analysis of gene expression profiles use clustering to find groups of coexpressed genes which have similar expression patterns. However clustering is time consuming and could be diffcult for very large scale dataset. We proposed the idea of Discovering Distinct Patterns (DDP) in gene expression profiles. Since patterns showing by the gene expressions reveal their regulate mechanisms. It is significant to find all different patterns existing in the dataset when there is little prior knowledge. It is also a helpful start before taking on further analysis. We propose an algorithm for DDP by iteratively picking out pairs of gene expression patterns which have the largest dissimilarities. This method can also be used as preprocessing to initialize centers for clustering methods, like K-means. Experiments on both synthetic dataset and real gene expression datasets show our method is very effective in finding distinct patterns which have gene functional significance and is also effcient.
传统的基因表达谱分析使用聚类来寻找具有相似表达模式的共表达基因群体。然而,聚类耗时,对于非常大规模的数据集可能会很困难。我们提出了在基因表达谱中发现不同模式(DDP)的想法。由于基因表达所呈现的模式揭示了它们的调控机制。在几乎没有先验知识的情况下,找到数据集中存在的所有不同模式具有重要意义。这也是在进行进一步分析之前的一个有益开端。我们提出了一种用于DDP的算法,通过迭代挑选出具有最大差异的基因表达模式对。该方法还可以用作预处理,为聚类方法(如K均值)初始化中心。在合成数据集和真实基因表达数据集上的实验表明,我们的方法在发现具有基因功能意义的不同模式方面非常有效且高效。