Department of Biological Sciences, University of Maryland Baltimore County (UMBC), 1000 Hilltop Circle, Baltimore, MD 21250, USA.
Nucleic Acids Res. 2014 Jan;42(Database issue):D156-60. doi: 10.1093/nar/gkt1123. Epub 2013 Nov 14.
The influx of high-throughput data and the need for complex models to describe the interaction of prokaryotic transcription factors (TF) with their target sites pose new challenges for TF-binding site databases. CollecTF (http://collectf.umbc.edu) compiles data on experimentally validated, naturally occurring TF-binding sites across the Bacteria domain, placing a strong emphasis on the transparency of the curation process, the quality and availability of the stored data and fully customizable access to its records. CollecTF integrates multiple sources of data automatically and openly, allowing users to dynamically redefine binding motifs and their experimental support base. Data quality and currency are fostered in CollecTF by adopting a sustainable model that encourages direct author submissions in combination with in-house validation and curation of published literature. CollecTF entries are periodically submitted to NCBI for integration into RefSeq complete genome records as link-out features, maximizing the visibility of the data and enriching the annotation of RefSeq files with regulatory information. Seeking to facilitate comparative genomics and machine-learning analyses of regulatory interactions, in its initial release CollecTF provides domain-wide coverage of two TF families (LexA and Fur), as well as extensive representation for a clinically important bacterial family, the Vibrionaceae.
高通量数据的涌入和描述原核转录因子 (TF) 与其靶位点相互作用的复杂模型的需求给 TF 结合位点数据库带来了新的挑战。CollecTF (http://collectf.umbc.edu) 汇编了细菌域中经过实验验证的、自然发生的 TF 结合位点的数据,特别强调了策过程的透明度、存储数据的质量和可用性,以及对其记录的完全可定制访问。CollecTF 自动和公开地集成了多个数据源,允许用户动态重新定义结合基序及其实验支持基础。通过采用一种可持续的模型,CollecTF 促进了数据的质量和时效性,该模型鼓励直接作者提交,并结合内部验证和对已发表文献的策,CollecTF 条目的周期性提交到 NCBI 中,以整合到 RefSeq 完整基因组记录中作为链接功能,最大限度地提高数据的可见性,并丰富 RefSeq 文件的注释,增加调控信息。为了促进调控相互作用的比较基因组学和机器学习分析,CollecTF 在其初始版本中提供了两个 TF 家族 (LexA 和 Fur) 的全领域覆盖,以及对临床重要的细菌家族 Vibrio 家族的广泛代表。