Spadafore Maxwell, Najarian Kayvan, Boyle Alan P
University of Michigan Medical School, 1301 Catherine, Ann Arbor, 48109-5624, USA.
University of Michigan Department of Computational Medicine and Bioinformatics, 100 Washtenaw Avenue, Ann Arbor, 48109, USA.
BMC Bioinformatics. 2017 Nov 29;18(1):530. doi: 10.1186/s12859-017-1935-y.
Transcription factors (TFs) form a complex regulatory network within the cell that is crucial to cell functioning and human health. While methods to establish where a TF binds to DNA are well established, these methods provide no information describing how TFs interact with one another when they do bind. TFs tend to bind the genome in clusters, and current methods to identify these clusters are either limited in scope, unable to detect relationships beyond motif similarity, or not applied to TF-TF interactions.
Here, we present a proximity-based graph clustering approach to identify TF clusters using either ChIP-seq or motif search data. We use TF co-occurrence to construct a filtered, normalized adjacency matrix and use the Markov Clustering Algorithm to partition the graph while maintaining TF-cluster and cluster-cluster interactions. We then apply our graph structure beyond clustering, using it to increase the accuracy of motif-based TFBS searching for an example TF.
We show that our method produces small, manageable clusters that encapsulate many known, experimentally validated transcription factor interactions and that our method is capable of capturing interactions that motif similarity methods might miss. Our graph structure is able to significantly increase the accuracy of motif TFBS searching, demonstrating that the TF-TF connections within the graph correlate with biological TF-TF interactions.
The interactions identified by our method correspond to biological reality and allow for fast exploration of TF clustering and regulatory dynamics.
转录因子(TFs)在细胞内形成一个复杂的调控网络,这对细胞功能和人类健康至关重要。虽然确定TF与DNA结合位置的方法已经很成熟,但这些方法无法提供有关TF在结合时如何相互作用的信息。TF倾向于以簇的形式结合基因组,而目前识别这些簇的方法要么范围有限,无法检测到基序相似性之外的关系,要么未应用于TF-TF相互作用。
在此,我们提出一种基于邻近性的图聚类方法,使用ChIP-seq或基序搜索数据来识别TF簇。我们利用TF共现构建一个经过筛选、归一化的邻接矩阵,并使用马尔可夫聚类算法对图进行划分,同时保持TF-簇和簇-簇相互作用。然后,我们将图结构应用于聚类之外,以提高基于基序的TFBS搜索(针对一个示例TF)的准确性。
我们表明,我们的方法产生的簇小且易于管理,包含许多已知的、经过实验验证的转录因子相互作用,并且我们的方法能够捕捉基序相似性方法可能遗漏的相互作用。我们的图结构能够显著提高基序TFBS搜索的准确性,表明图内的TF-TF连接与生物学上的TF-TF相互作用相关。
我们的方法所识别的相互作用符合生物学现实,能够快速探索TF聚类和调控动态。