Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, 48109, USA.
BMC Bioinformatics. 2021 Feb 15;22(1):70. doi: 10.1186/s12859-021-03981-4.
The quantity of genomic data is expanding at an increasing rate. Tools for phylogenetic analysis which scale to the quantity of available data are required. To address this need, we present cognac, a user-friendly software package to rapidly generate concatenated gene alignments for phylogenetic analysis.
We illustrate that cognac is able to rapidly identify phylogenetic marker genes using a data driven approach and efficiently generate concatenated gene alignments for very large genomic datasets. To benchmark our tool, we generated core gene alignments for eight unique genera of bacteria, including a dataset of over 11,000 genomes from the genus Escherichia producing an alignment with 1353 genes, which was constructed in less than 17 h.
We demonstrate that cognac presents an efficient method for generating concatenated gene alignments for phylogenetic analysis. We have released cognac as an R package ( https://github.com/rdcrawford/cognac ) with customizable parameters for adaptation to diverse applications.
基因组数据的数量正在以越来越快的速度增长。需要能够扩展到可用数据量的系统发育分析工具。为满足这一需求,我们提出了 cognac,这是一个用户友好的软件包,可用于快速生成用于系统发育分析的连锁基因排列。
我们通过数据驱动的方法说明 cognac 能够快速识别系统发育标记基因,并能够有效地为非常大的基因组数据集生成连锁基因排列。为了对我们的工具进行基准测试,我们为包括埃希氏菌属在内的八个独特细菌属生成了核心基因排列,该数据集包含超过 11000 个基因组,产生了一个包含 1353 个基因的排列,不到 17 小时即可构建完成。
我们证明 cognac 为进行连锁基因排列提供了一种有效的系统发育分析方法。我们已经发布了 cognac 作为一个 R 包(https://github.com/rdcrawford/cognac),具有可定制的参数,可适应各种应用。