Chan Tak-Ming, Lo Leung-Yau, Sze-To Ho-Yin, Leung Kwong-Sak, Xiao Xinshu, Wong Man-Hon
University of California Los Angeles, Los Angeles.
IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):696-707. doi: 10.1109/TCBB.2013.60.
Understanding protein-DNA interactions, specifically transcription factor (TF) and transcription factor binding site (TFBS) bindings, is crucial in deciphering gene regulation. The recent associated TF-TFBS pattern discovery combines one-sided motif discovery on both the TF and the TFBS sides. Using sequences only, it identifies the short protein-DNA binding cores available only in high-resolution 3D structures. The discovered patterns lead to promising subtype and disease analysis applications. While the related studies use either association rule mining or existing TFBS annotations, none has proposed any formal unified (both-sided) model to prioritize the top verifiable associated patterns. We propose the unified scores and develop an effective pipeline for associated TF-TFBS pattern discovery. Our stringent instance-level evaluations show that the patterns with the top unified scores match with the binding cores in 3D structures considerably better than the previous works, where up to 90 percent of the top 20 scored patterns are verified. We also introduce extended verification from literature surveys, where the high unified scores correspond to even higher verification percentage. The top scored patterns are confirmed to match the known WRKY binding cores with no available 3D structures and agree well with the top binding affinities of in vivo experiments.
理解蛋白质与DNA的相互作用,特别是转录因子(TF)与转录因子结合位点(TFBS)的结合,对于解读基因调控至关重要。最近相关的TF-TFBS模式发现结合了在TF和TFBS两侧进行的单边基序发现。仅使用序列,它就能识别出仅在高分辨率三维结构中才有的短蛋白质-DNA结合核心。所发现的模式带来了有前景的亚型和疾病分析应用。虽然相关研究要么使用关联规则挖掘,要么使用现有的TFBS注释,但没有一项研究提出任何正式的统一(双边)模型来对顶级可验证关联模式进行优先级排序。我们提出了统一分数,并开发了一种有效的管道用于相关TF-TFBS模式发现。我们严格的实例级评估表明,具有顶级统一分数的模式与三维结构中的结合核心匹配程度比之前的工作要好得多,其中前20个得分最高的模式中有多达90%得到了验证。我们还通过文献调查引入了扩展验证,其中高统一分数对应着更高的验证百分比。得分最高的模式被证实与已知的WRKY结合核心匹配,这些核心没有可用的三维结构,并且与体内实验的顶级结合亲和力非常吻合。