Tang Rong, Müller Hans-Georg
Division of Biostatistics, Center for Devices and Radiological Health, Food and Drug Administration, Rockville, MD 20850, USA.
Biostatistics. 2009 Jan;10(1):32-45. doi: 10.1093/biostatistics/kxn011. Epub 2008 May 22.
Current clustering methods are routinely applied to gene expression time course data to find genes with similar activation patterns and ultimately to understand the dynamics of biological processes. As the dynamic unfolding of a biological process often involves the activation of genes at different rates, successful clustering in this context requires dealing with varying time and shape patterns simultaneously. This motivates the combination of a novel pairwise warping with a suitable clustering method to discover expression shape clusters. We develop a novel clustering method that combines an initial pairwise curve alignment to adjust for time variation within likely clusters. The cluster-specific time synchronization method shows excellent performance over standard clustering methods in terms of cluster quality measures in simulations and for yeast and human fibroblast data sets. In the yeast example, the discovered clusters have high concordance with the known biological processes.
当前的聚类方法经常应用于基因表达时间序列数据,以寻找具有相似激活模式的基因,并最终理解生物过程的动态变化。由于生物过程的动态展开通常涉及不同速率的基因激活,在这种情况下成功聚类需要同时处理不同的时间和形状模式。这促使将一种新颖的成对扭曲与合适的聚类方法相结合,以发现表达形状聚类。我们开发了一种新颖的聚类方法,该方法结合了初始的成对曲线对齐,以调整可能聚类内的时间变化。在模拟以及酵母和人类成纤维细胞数据集的聚类质量度量方面,特定于聚类的时间同步方法相对于标准聚类方法表现出优异的性能。在酵母示例中,发现的聚类与已知的生物过程高度一致。