Suppr超能文献

预测真核转录因子的 DNA 结合特异性。

Predicting DNA-binding specificities of eukaryotic transcription factors.

机构信息

Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Tübingen, Germany.

出版信息

PLoS One. 2010 Nov 30;5(11):e13876. doi: 10.1371/journal.pone.0013876.

Abstract

Today, annotated amino acid sequences of more and more transcription factors (TFs) are readily available. Quantitative information about their DNA-binding specificities, however, are hard to obtain. Position frequency matrices (PFMs), the most widely used models to represent binding specificities, are experimentally characterized only for a small fraction of all TFs. Even for some of the most intensively studied eukaryotic organisms (i.e., human, rat and mouse), roughly one-sixth of all proteins with annotated DNA-binding domain have been characterized experimentally. Here, we present a new method based on support vector regression for predicting quantitative DNA-binding specificities of TFs in different eukaryotic species. This approach estimates a quantitative measure for the PFM similarity of two proteins, based on various features derived from their protein sequences. The method is trained and tested on a dataset containing 1 239 TFs with known DNA-binding specificity, and used to predict specific DNA target motifs for 645 TFs with high accuracy.

摘要

如今,越来越多的转录因子 (TF) 的注释氨基酸序列都可以轻松获得。然而,关于它们的 DNA 结合特异性的定量信息却很难获取。位置频率矩阵 (PFM) 是最广泛用于表示结合特异性的模型,但仅对一小部分 TF 进行了实验表征。即使对于一些研究最深入的真核生物(即人类、大鼠和小鼠),也只有大约六分之一的具有注释 DNA 结合域的蛋白质进行了实验表征。在这里,我们提出了一种新的基于支持向量回归的方法,用于预测不同真核生物中 TF 的定量 DNA 结合特异性。该方法基于从蛋白质序列中提取的各种特征,估计两个蛋白质的 PFM 相似性的定量度量。该方法在包含 1239 个具有已知 DNA 结合特异性的 TF 的数据集上进行训练和测试,并用于高精度地预测 645 个具有高特异性的 TF 的特定 DNA 靶标基序。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/465b/2994704/0eca08b04bf9/pone.0013876.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验