Suppr超能文献

利用 DNA 的序列特异性化学和结构特性来预测转录因子结合位点。

Using sequence-specific chemical and structural properties of DNA to predict transcription factor binding sites.

机构信息

Theoretical Biology and Biophysics Group, Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico, United States of America.

出版信息

PLoS Comput Biol. 2010 Nov 18;6(11):e1001007. doi: 10.1371/journal.pcbi.1001007.

Abstract

An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate.

摘要

理解基因调控的一个重要步骤是确定每个转录因子(TF)识别的 DNA 结合位点。预测 TF 结合位点的传统方法涉及到定义共识序列或位置特异性权重矩阵,并依赖于已知结合位点的 DNA 序列的统计分析。在这里,我们提出了一种称为 SiteSleuth 的方法,该方法将 DNA 结构预测、计算化学和机器学习应用于开发 TF 结合位点的模型。在这种方法中,二进制分类器经过训练,可以根据 DNA 的序列特异性化学和结构特征来区分真实和虚假的结合位点。这些特征是通过分子动力学计算确定的,我们考虑了不同局部邻域中的每个碱基。对于大肠杆菌中至少有五个 DNA 结合位点在 RegulonDB 中记录的 54 个 TF 中的每一个,TF 结合位点和非编码基因组序列的部分都被映射到特征向量中,并用于训练。根据交叉验证分析和与 ChIP-chip 数据的比较,这些数据可用于 TF Fis,SiteSleuth 优于三种传统方法:Match、MATRIX SEARCH 和 Berg 和 von Hippel 方法。SiteSleuth 也优于 QPMEME,这是一种类似于 SiteSleuth 的方法,因为它涉及学习算法。SiteSleuth 的主要优势在于较低的假阳性率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f3a/2987836/44bb65764f93/pcbi.1001007.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验