Ankenbrand Markus J, Keller Alexander
Department of Animal Ecology and Tropical Biology, University of Würzburg, Germany.
Genome. 2016 Oct;59(10):783-791. doi: 10.1139/gen-2015-0175. Epub 2016 May 11.
The need for multi-gene analyses in scientific fields such as phylogenetics and DNA barcoding has increased in recent years. In particular, these approaches are increasingly important for differentiating bacterial species, where reliance on the standard 16S rDNA marker can result in poor resolution. Additionally, the assembly of bacterial genomes has become a standard task due to advances in next-generation sequencing technologies. We created a bioinformatic pipeline, bcgTree, which uses assembled bacterial genomes either from databases or own sequencing results from the user to reconstruct their phylogenetic history. The pipeline automatically extracts 107 essential single-copy core genes, found in a majority of bacteria, using hidden Markov models and performs a partitioned maximum-likelihood analysis. Here, we describe the workflow of bcgTree and, as a proof-of-concept, its usefulness in resolving the phylogeny of 293 publically available bacterial strains of the genus Lactobacillus. We also evaluate its performance in both low- and high-level taxonomy test sets. The tool is freely available at github ( https://github.com/iimog/bcgTree ) and our institutional homepage ( http://www.dna-analytics.biozentrum.uni-wuerzburg.de ).
近年来,在系统发育学和DNA条形码等科学领域,多基因分析的需求不断增加。特别是,这些方法对于区分细菌物种变得越来越重要,因为依赖标准的16S rDNA标记可能导致分辨率较差。此外,由于下一代测序技术的进步,细菌基因组的组装已成为一项标准任务。我们创建了一个生物信息学流程bcgTree,它使用来自数据库的组装细菌基因组或用户自己的测序结果来重建它们的系统发育历史。该流程使用隐马尔可夫模型自动提取在大多数细菌中发现的107个必需单拷贝核心基因,并进行分区最大似然分析。在这里,我们描述了bcgTree的工作流程,并作为概念验证,展示了它在解析293株公开可用的乳酸杆菌属细菌菌株的系统发育方面的有用性。我们还评估了它在低级别和高级别分类测试集中的性能。该工具可在github(https://github.com/iimog/bcgTree)和我们机构的主页(http://www.dna-analytics.biozentrum.uni-wuerzburg.de)上免费获取。