Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63108, USA.
Bioinformatics. 2010 Nov 1;26(21):2672-7. doi: 10.1093/bioinformatics/btq501. Epub 2010 Aug 31.
Computational techniques for microbial genomic sequence analysis are becoming increasingly important. With next-generation sequencing technology and the human microbiome project underway, current sequencing capacity is significantly greater than the speed at which organisms of interest can be studied experimentally. Most related computational work has been focused on sequence assembly, gene annotation and metabolic network reconstruction. We have developed a method that will primarily use available sequence data in order to determine prokaryotic transcription factor (TF) binding specificities.
Specificity determining residues (critical residues) were identified from crystal structures of DNA-protein complexes and TFs with the same critical residues were grouped into specificity classes. The putative binding regions for each class were defined as the set of promoters for each TF itself (autoregulatory) and the immediately upstream and downstream operons. MEME was used to find putative motifs within each separate class. Tests on the LacI and TetR TF families, using RegulonDB annotated sites, showed the sensitivity of prediction 86% and 80%, respectively.
微生物基因组序列分析的计算技术变得越来越重要。随着下一代测序技术和人类微生物组计划的进行,目前的测序能力大大超过了感兴趣的生物体可以进行实验研究的速度。大多数相关的计算工作都集中在序列组装、基因注释和代谢网络重建上。我们开发了一种方法,主要利用现有的序列数据来确定原核转录因子(TF)的结合特异性。
从 DNA-蛋白质复合物的晶体结构和具有相同关键残基的 TF 中确定了特异性决定残基(关键残基),并将具有相同关键残基的 TF 分为特异性类别。每个类别的确切结合区域被定义为每个 TF 自身的启动子集(自身调控)以及紧邻的上下游操纵子。MEME 用于在每个单独的类别中找到假定的基序。在使用 RegulonDB 注释的 LacI 和 TetR TF 家族上进行的测试中,预测的敏感性分别为 86%和 80%。