Suppr超能文献

发现蛋白质与 DNA 相互作用的近似相关序列模式。

Discovering approximate-associated sequence patterns for protein-DNA interactions.

机构信息

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N. T., Hong Kong.

出版信息

Bioinformatics. 2011 Feb 15;27(4):471-8. doi: 10.1093/bioinformatics/btq682. Epub 2010 Dec 30.

Abstract

MOTIVATION

The bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) are fundamental protein-DNA interactions in transcriptional regulation. Extensive efforts have been made to better understand the protein-DNA interactions. Recent mining on exact TF-TFBS-associated sequence patterns (rules) has shown great potentials and achieved very promising results. However, exact rules cannot handle variations in real data, resulting in limited informative rules. In this article, we generalize the exact rules to approximate ones for both TFs and TFBSs, which are essential for biological variations.

RESULTS

A progressive approach is proposed to address the approximation to alleviate the computational requirements. Firstly, similar TFBSs are grouped from the available TF-TFBS data (TRANSFAC database). Secondly, approximate and highly conserved binding cores are discovered from TF sequences corresponding to each TFBS group. A customized algorithm is developed for the specific objective. We discover the approximate TF-TFBS rules by associating the grouped TFBS consensuses and TF cores. The rules discovered are evaluated by matching (verifying with) the actual protein-DNA binding pairs from Protein Data Bank (PDB) 3D structures. The approximate results exhibit many more verified rules and up to 300% better verification ratios than the exact ones. The customized algorithm achieves over 73% better verification ratios than traditional methods. Approximate rules (64-79%) are shown statistically significant. Detailed variation analysis and conservation verification on NCBI records demonstrate that the approximate rules reveal both the flexible and specific protein-DNA interactions accurately. The approximate TF-TFBS rules discovered show great generalized capability of exploring more informative binding rules.

摘要

动机

转录因子(TF)和转录因子结合位点(TFBS)之间的结合是转录调控中基本的蛋白质-DNA 相互作用。为了更好地理解蛋白质-DNA 相互作用,人们进行了广泛的研究。最近对精确的 TF-TFBS 相关序列模式(规则)的挖掘显示出了巨大的潜力,并取得了非常有前景的结果。然而,精确的规则无法处理真实数据中的变化,导致信息量有限的规则。在本文中,我们将精确规则推广到 TF 和 TFBS 的近似规则,这对于生物变化是必不可少的。

结果

提出了一种渐进的方法来解决逼近问题,以减轻计算需求。首先,从可用的 TF-TFBS 数据(TRANSFAC 数据库)中对 TFBS 进行分组。其次,从对应于每个 TFBS 组的 TF 序列中发现近似的和高度保守的结合核心。为特定目标开发了定制算法。我们通过关联分组的 TFBS 共识和 TF 核心来发现近似的 TF-TFBS 规则。通过将实际的蛋白质-DNA 结合对(来自蛋白质数据库(PDB)3D 结构的验证)与规则进行匹配(验证)来评估发现的规则。近似结果显示出更多的验证规则和高达 300%的更好的验证比率,而精确结果则更少。定制算法的验证率比传统方法提高了 73%以上。近似规则(64-79%)在统计学上具有显著意义。对 NCBI 记录的详细变异分析和保守性验证表明,近似规则准确地揭示了灵活和特定的蛋白质-DNA 相互作用。发现的近似 TF-TFBS 规则显示出了探索更具信息量的结合规则的强大通用性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验