Chen Shuhua, Liu Juan, Zeng Tao
School of Computer, Wuhan University, Wuhan, Hubei 430072, China.
School of Computer, Wuhan University, Wuhan, Hubei 430072, China.
Methods. 2015 Jul 15;83:18-27. doi: 10.1016/j.ymeth.2015.04.005. Epub 2015 Apr 15.
In microarray analysis, biclustering is used to find the maximal subsets of rows and columns satisfying some coherence criteria. The found submatrices are usually called as biclusters. On one hand, different criteria would help to find different types of biclusters, thus the definition of coherence criterion is critical to the biclustering method. On the other hand, qualitative criteria result to qualitative biclustering methods that cannot evaluate the qualities of the biclusters, while quantitative criteria can numerically show how well the mined biclusters and are more useful in real applications. In bioinformatics communities, there are several quantitative coherence measurements for linear patterns proposed. However, they face the problem of weakness in finding all subtypes of linear patterns or sensitivity to the noise. In this work, we introduce a coherence measurement for the general linear patterns, the minimal mean squared error (MMSE), which is designed to handle the evaluation of biclusters with shifting, scaling and the general linear (the mixed form of shifting and scaling) correlations. The experiments on synthetic and real data sets show that the proposed methods is appropriate for identifying significant general linear biclusters.
在微阵列分析中,双聚类用于寻找满足某些一致性标准的行和列的最大子集。找到的子矩阵通常称为双聚类。一方面,不同的标准有助于找到不同类型的双聚类,因此一致性标准的定义对双聚类方法至关重要。另一方面,定性标准导致定性双聚类方法无法评估双聚类的质量,而定性标准可以数值显示挖掘出的双聚类的质量如何,并且在实际应用中更有用。在生物信息学领域,已经提出了几种用于线性模式的定量一致性度量。然而,它们面临着在寻找线性模式的所有子类型时存在弱点或对噪声敏感的问题。在这项工作中,我们引入了一种用于一般线性模式的一致性度量,即最小均方误差(MMSE),它旨在处理具有移位、缩放和一般线性(移位和缩放的混合形式)相关性的双聚类的评估。在合成数据集和真实数据集上的实验表明,所提出的方法适用于识别显著的一般线性双聚类。