Gaiteri Chris, Chen Mingming, Szymanski Boleslaw, Kuzmin Konstantin, Xie Jierui, Lee Changkyu, Blanche Timothy, Chaibub Neto Elias, Huang Su-Chun, Grabowski Thomas, Madhyastha Tara, Komashko Vitalina
Rush University Medical Center, Alzheimer's Disease Center, Chicago, IL.
Allen Institute for Brain Science, Modeling, Analysis and Theory Group, Seattle, WA.
Sci Rep. 2015 Nov 9;5:16361. doi: 10.1038/srep16361.
Biological functions are carried out by groups of interacting molecules, cells or tissues, known as communities. Membership in these communities may overlap when biological components are involved in multiple functions. However, traditional clustering methods detect non-overlapping communities. These detected communities may also be unstable and difficult to replicate, because traditional methods are sensitive to noise and parameter settings. These aspects of traditional clustering methods limit our ability to detect biological communities, and therefore our ability to understand biological functions. To address these limitations and detect robust overlapping biological communities, we propose an unorthodox clustering method called SpeakEasy which identifies communities using top-down and bottom-up approaches simultaneously. Specifically, nodes join communities based on their local connections, as well as global information about the network structure. This method can quantify the stability of each community, automatically identify the number of communities, and quickly cluster networks with hundreds of thousands of nodes. SpeakEasy shows top performance on synthetic clustering benchmarks and accurately identifies meaningful biological communities in a range of datasets, including: gene microarrays, protein interactions, sorted cell populations, electrophysiology and fMRI brain imaging.
生物功能是由相互作用的分子、细胞或组织群体来执行的,这些群体被称为群落。当生物成分参与多种功能时,这些群落中的成员可能会重叠。然而,传统的聚类方法检测到的是不重叠的群落。这些检测到的群落也可能不稳定且难以复制,因为传统方法对噪声和参数设置很敏感。传统聚类方法的这些方面限制了我们检测生物群落的能力,进而限制了我们理解生物功能的能力。为了解决这些限制并检测出稳健的重叠生物群落,我们提出了一种非传统的聚类方法,称为SpeakEasy,它同时使用自上而下和自下而上的方法来识别群落。具体来说,节点根据其局部连接以及有关网络结构的全局信息加入群落。该方法可以量化每个群落的稳定性,自动识别群落的数量,并能快速对具有数十万个节点的网络进行聚类。SpeakEasy在合成聚类基准测试中表现出色,并能在一系列数据集中准确识别出有意义的生物群落,这些数据集包括:基因微阵列、蛋白质相互作用、分选细胞群体、电生理学和功能磁共振成像脑成像。