Hestilow Travis J, Huang Yufei
Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio, TX 78249, USA.
EURASIP J Bioinform Syst Biol. 2009;2009(1):195712. doi: 10.1155/2009/195712. Epub 2009 Apr 23.
A method for gene clustering from expression profiles using shape information is presented. The conventional clustering approaches such as K-means assume that genes with similar functions have similar expression levels and hence allocate genes with similar expression levels into the same cluster. However, genes with similar function often exhibit similarity in signal shape even though the expression magnitude can be far apart. Therefore, this investigation studies clustering according to signal shape similarity. This shape information is captured in the form of normalized and time-scaled forward first differences, which then are subject to a variational Bayes clustering plus a non-Bayesian (Silhouette) cluster statistic. The statistic shows an improved ability to identify the correct number of clusters and assign the components of cluster. Based on initial results for both generated test data and Escherichia coli microarray expression data and initial validation of the Escherichia coli results, it is shown that the method has promise in being able to better cluster time-series microarray data according to shape similarity.
提出了一种利用形状信息从表达谱中进行基因聚类的方法。传统的聚类方法,如K均值法,假定具有相似功能的基因具有相似的表达水平,因此将具有相似表达水平的基因分配到同一聚类中。然而,具有相似功能的基因即使表达量可能相差很大,其信号形状通常也会表现出相似性。因此,本研究根据信号形状相似性进行聚类。这种形状信息以归一化和时间缩放的前向一阶差分的形式捕获,然后对其进行变分贝叶斯聚类以及非贝叶斯(轮廓)聚类统计。该统计显示出在识别正确聚类数量和分配聚类成分方面有更强的能力。基于生成的测试数据和大肠杆菌微阵列表达数据的初步结果以及对大肠杆菌结果的初步验证,表明该方法有望能够根据形状相似性更好地对时间序列微阵列数据进行聚类。