Getz G, Levine E, Domany E
Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel.
Proc Natl Acad Sci U S A. 2000 Oct 24;97(22):12079-84. doi: 10.1073/pnas.210134797.
We present a coupled two-way clustering approach to gene microarray data analysis. The main idea is to identify subsets of the genes and samples, such that when one of these is used to cluster the other, stable and significant partitions emerge. The search for such subsets is a computationally complex task. We present an algorithm, based on iterative clustering, that performs such a search. This analysis is especially suitable for gene microarray data, where the contributions of a variety of biological mechanisms to the gene expression levels are entangled in a large body of experimental data. The method was applied to two gene microarray data sets, on colon cancer and leukemia. By identifying relevant subsets of the data and focusing on them we were able to discover partitions and correlations that were masked and hidden when the full dataset was used in the analysis. Some of these partitions have clear biological interpretation; others can serve to identify possible directions for future research.
我们提出了一种用于基因微阵列数据分析的双向耦合聚类方法。其主要思想是识别基因和样本的子集,使得当其中一个用于对另一个进行聚类时,能够出现稳定且显著的划分。寻找这样的子集是一项计算复杂的任务。我们提出了一种基于迭代聚类的算法来执行这种搜索。这种分析特别适用于基因微阵列数据,在这类数据中,多种生物学机制对基因表达水平的贡献在大量实验数据中相互交织。该方法应用于两个基因微阵列数据集,分别是结肠癌和白血病数据集。通过识别数据的相关子集并专注于它们,我们能够发现当使用完整数据集进行分析时被掩盖和隐藏的划分及相关性。其中一些划分具有明确的生物学解释;其他划分则有助于确定未来研究的可能方向。