Li Ai, Horvath Steve
Department of Human Genetics, University of California, Los Angeles, CA 90095, USA.
BMC Res Notes. 2009 Jul 20;2:142. doi: 10.1186/1756-0500-2-142.
Many clustering procedures only allow the user to input a pairwise dissimilarity or distance measure between objects. We propose a clustering method that can input a multi-point dissimilarity measure d(i1, i2, ..., iP) where the number of points P can be larger than 2. The work is motivated by gene network analysis where clusters correspond to modules of highly interconnected nodes. Here, we define modules as clusters of network nodes with high multi-node topological overlap. The topological overlap measure is a robust measure of interconnectedness which is based on shared network neighbors. In previous work, we have shown that the multi-node topological overlap measure yields biologically meaningful results when used as input of network neighborhood analysis.
We adapt network neighborhood analysis for the use of module detection. We propose the Module Affinity Search Technique (MAST), which is a generalized version of the Cluster Affinity Search Technique (CAST). MAST can accommodate a multi-node dissimilarity measure. Clusters grow around user-defined or automatically chosen seeds (e.g. hub nodes). We propose both local and global cluster growth stopping rules. We use several simulations and a gene co-expression network application to argue that the MAST approach leads to biologically meaningful results. We compare MAST with hierarchical clustering and partitioning around medoid clustering.
Our flexible module detection method is implemented in the MTOM software which can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/MTOM/
许多聚类程序只允许用户输入对象之间的成对差异或距离度量。我们提出了一种聚类方法,该方法可以输入多点差异度量d(i1, i2, ..., iP),其中点的数量P可以大于2。这项工作的动机来自基因网络分析,其中聚类对应于高度互连节点的模块。在这里,我们将模块定义为具有高多节点拓扑重叠的网络节点聚类。拓扑重叠度量是一种基于共享网络邻居的互连性稳健度量。在先前的工作中,我们已经表明,当用作网络邻域分析的输入时,多节点拓扑重叠度量会产生生物学上有意义的结果。
我们将网络邻域分析应用于模块检测。我们提出了模块亲和搜索技术(MAST),它是聚类亲和搜索技术(CAST)的广义版本。MAST可以适应多节点差异度量。聚类围绕用户定义的或自动选择的种子(例如中心节点)生长。我们提出了局部和全局聚类生长停止规则。我们使用几个模拟和一个基因共表达网络应用来论证MAST方法会产生生物学上有意义的结果。我们将MAST与层次聚类和围绕中心点划分聚类进行了比较。
我们灵活的模块检测方法在MTOM软件中实现,该软件可以从以下网页下载:http://www.genetics.ucla.edu/labs/horvath/MTOM/