Department of Molecular Genetics, University of Groningen, Groningen, The Netherlands.
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W647-51. doi: 10.1093/nar/gkq365. Epub 2010 May 12.
Mining bacterial genomes for bacteriocins is a challenging task due to the substantial structure and sequence diversity, and generally small sizes, of these antimicrobial peptides. Major progress in the research of antimicrobial peptides and the ever-increasing quantities of genomic data, varying from (un)finished genomes to meta-genomic data, led us to develop the significantly improved genome mining software BAGEL2, as a follow-up of our previous BAGEL software. BAGEL2 identifies putative bacteriocins on the basis of conserved domains, physical properties and the presence of biosynthesis, transport and immunity genes in their genomic context. The software supports parameter-free, class-specific mining and has high-throughput capabilities. Besides building an expert validated bacteriocin database, we describe the development of novel Hidden Markov Models (HMMs) and the interpretation of combinations of HMMs via simple decision rules for prediction of bacteriocin (sub-)classes. Furthermore, the genetic context is automatically annotated based on (combinations of) PFAM domains and databases of known context genes. The scoring system was fine-tuned using expert knowledge on data derived from screening all bacterial genomes currently available at the NCBI. BAGEL2 is freely accessible at http://bagel2.molgenrug.nl.
由于这些抗菌肽的结构和序列多样性很大,通常尺寸较小,因此从细菌基因组中挖掘细菌素是一项具有挑战性的任务。抗菌肽研究的重大进展以及越来越多的基因组数据,从(未)完成的基因组到元基因组数据,促使我们开发了经过显著改进的基因组挖掘软件 BAGEL2,作为我们之前的 BAGEL 软件的后续。BAGEL2 根据保守结构域、物理特性以及生物合成、运输和免疫基因在基因组环境中的存在,识别潜在的细菌素。该软件支持无参数、特定类别的挖掘,具有高通量能力。除了构建经过专家验证的细菌素数据库外,我们还描述了新型隐马尔可夫模型 (HMM) 的开发以及通过简单决策规则解释 HMM 的组合,用于预测细菌素(亚)类。此外,基于(组合)PFAM 域和已知上下文基因数据库,自动注释基因上下文。使用从 NCBI 目前可获得的所有细菌基因组筛选中获得的数据的专家知识对评分系统进行了微调。BAGEL2 可在 http://bagel2.molgenrug.nl 免费获取。