Hleap Jose Sergio, Blouin Christian
Department of Biochemistry and Molecular Biology, Dalhouise University, Halifax, Nova Scotia, Canada.
Department of Biochemistry and Molecular Biology, Dalhouise University, Halifax, Nova Scotia, Canada; Department of Computer Science, Dalhouise University, Halifax, Nova Scotia, Canada.
PLoS One. 2014 Nov 19;9(11):e113438. doi: 10.1371/journal.pone.0113438. eCollection 2014.
Community structure detection is an important tool in graph analysis. This can be done, among other ways, by solving for the partition set which optimizes the modularity scores [Formula: see text]. Here it is shown that topological constraints in correlation graphs induce over-fragmentation of community structures. A refinement step to this optimization based on Linear Discriminant Analysis (LDA) and a statistical test for significance is proposed. In structured simulation constrained by topology, this novel approach performs better than the optimization of modularity alone. This method was also tested with two empirical datasets: the Roll-Call voting in the 110th US Senate constrained by geographic adjacency, and a biological dataset of 135 protein structures constrained by inter-residue contacts. The former dataset showed sub-structures in the communities that revealed a regional bias in the votes which transcend party affiliations. This is an interesting pattern given that the 110th Legislature was assumed to be a highly polarized government. The [Formula: see text]-amylase catalytic domain dataset (biological dataset) was analyzed with and without topological constraints (inter-residue contacts). The results without topological constraints showed differences with the topology constrained one, but the LDA filtering did not change the outcome of the latter. This suggests that the LDA filtering is a robust way to solve the possible over-fragmentation when present, and that this method will not affect the results where there is no evidence of over-fragmentation.
社区结构检测是图分析中的一项重要工具。除其他方法外,这可以通过求解优化模块性得分的划分集来实现[公式:见正文]。本文表明,相关图中的拓扑约束会导致社区结构过度碎片化。提出了基于线性判别分析(LDA)的优化细化步骤以及显著性统计检验。在受拓扑结构约束的结构化模拟中,这种新方法比单独的模块性优化表现更好。该方法还在两个实证数据集上进行了测试:受地理邻接性约束的美国第110届参议院唱名投票,以及受残基间接触约束的135个蛋白质结构的生物学数据集。前一个数据集显示了社区中的子结构,揭示了超越党派归属的投票区域偏差。鉴于第110届立法机构被认为是一个高度两极分化的政府,这是一个有趣的模式。对[公式:见正文]淀粉酶催化结构域数据集(生物学数据集)在有和没有拓扑约束(残基间接触)的情况下进行了分析。没有拓扑约束的结果与有拓扑约束的结果不同,但LDA过滤并没有改变后者的结果。这表明LDA过滤是解决可能存在的过度碎片化问题的一种稳健方法,并且该方法不会影响不存在过度碎片化证据时的结果。