footprintDB：一个具有注释顺式作用元件和结合界面的转录因子数据库。

footprintDB: a database of transcription factors with annotated cis elements and binding interfaces.

机构信息

Laboratory of Computational Biology, Department of Genetics and Plant Production, Estación Experimental de Aula Dei/CSIC, Av. Montañana 1005, Zaragoza (http://www.eead.csic.es/compbio) and Fundación ARAID, Paseo María Agustín 36, Zaragoza, Spain.

出版信息

Bioinformatics. 2014 Jan 15;30(2):258-65. doi: 10.1093/bioinformatics/btt663. Epub 2013 Nov 14.

DOI:10.1093/bioinformatics/btt663

PMID:24234003

Abstract

MOTIVATION

Traditional and high-throughput techniques for determining transcription factor (TF) binding specificities are generating large volumes of data of uneven quality, which are scattered across individual databases.

RESULTS

FootprintDB integrates some of the most comprehensive freely available libraries of curated DNA binding sites and systematically annotates the binding interfaces of the corresponding TFs. The first release contains 2422 unique TF sequences, 10 112 DNA binding sites and 3662 DNA motifs. A survey of the included data sources, organisms and TF families was performed together with proprietary database TRANSFAC, finding that footprintDB has a similar coverage of multicellular organisms, while also containing bacterial regulatory data. A search engine has been designed that drives the prediction of DNA motifs for input TFs, or conversely of TF sequences that might recognize input regulatory sequences, by comparison with database entries. Such predictions can also be extended to a single proteome chosen by the user, and results are ranked in terms of interface similarity. Benchmark experiments with bacterial, plant and human data were performed to measure the predictive power of footprintDB searches, which were able to correctly recover 10, 55 and 90% of the tested sequences, respectively. Correctly predicted TFs had a higher interface similarity than the average, confirming its diagnostic value.

AVAILABILITY AND IMPLEMENTATION

Web site implemented in PHP,Perl, MySQL and Apache. Freely available from http://floresta.eead.csic.es/footprintdb.

摘要

动机

传统的和高通量的技术，用于确定转录因子（TF）结合特异性正在生成大量的数据，其质量参差不齐，分散在各个数据库中。

结果

FootprintDB 集成了一些最全面的免费提供的已审定 DNA 结合位点库，并系统地注释了相应 TF 的结合界面。第一个版本包含 2422 个独特的 TF 序列，10112 个 DNA 结合位点和 3662 个 DNA 基序。对包括的数据源、生物体和 TF 家族进行了调查，与专有数据库 TRANSFAC 一起进行了调查，发现 footprintDB 对多细胞生物体的覆盖范围相似，同时还包含细菌调控数据。设计了一个搜索引擎，通过与数据库条目进行比较，驱动输入 TF 的 DNA 基序预测，或者相反地，驱动可能识别输入调控序列的 TF 序列预测。这种预测也可以扩展到用户选择的单个蛋白质组，结果根据界面相似性进行排名。使用细菌、植物和人类数据进行了基准实验，以测量 footprintDB 搜索的预测能力，分别能够正确地恢复 10%、55%和 90%的测试序列。正确预测的 TF 具有更高的界面相似性，高于平均值，证实了其诊断价值。