Suppr超能文献

mOTUpan:一种利用宏基因组组装基因组进行核心基因组估计的稳健贝叶斯方法。

mOTUpan: a robust Bayesian approach to leverage metagenome-assembled genomes for core-genome estimation.

作者信息

Buck Moritz, Mehrshad Maliheh, Bertilsson Stefan

机构信息

Department of Aquatic Sciences and Assessment, Swedish University of Agricultural Sciences, Lennart Hjelms väg 9, 75651 Uppsala, Sweden.

出版信息

NAR Genom Bioinform. 2022 Aug 15;4(3):lqac060. doi: 10.1093/nargab/lqac060. eCollection 2022 Sep.

Abstract

Recent advances in sequencing and bioinformatics have expanded the tree of life by providing genomes for uncultured environmentally relevant clades, either through metagenome-assembled genomes or through single-cell genomes. While this expanded diversity can provide novel insights into microbial population structure, most tools available for core-genome estimation are sensitive to genome completeness. Consequently, a major portion of the huge phylogenetic diversity uncovered by environmental genomic approaches remains excluded from such analyses. We present mOTUpan, a novel iterative Bayesian method for computing the core genome for sets of genomes of highly diverse completeness range. The likelihood for each gene cluster to belong to core or accessory genome is estimated by computing the probability of its presence/absence pattern in the target genome set. The core-genome prediction is computationally efficient and can be scaled up to thousands of genomes. It has shown comparable estimates to state-of-the-art tools Roary and PPanGGOLiN for high-quality genomes and is capable of using genomes at lower completeness thresholds. mOTUpan wraps a bootstrapping procedure to estimate the quality of a specific core-genome prediction, as the accuracy of each run will depend on the specific completeness distribution and the number of genomes in the dataset under scrutiny. mOTUpan is implemented in the mOTUlizer software package, and available at github.com/moritzbuck/mOTUlizer, under GPL 3.0 license.

摘要

测序技术和生物信息学的最新进展通过宏基因组组装基因组或单细胞基因组为未培养的环境相关进化枝提供基因组,从而扩展了生命之树。虽然这种扩展的多样性能够为微生物种群结构提供新的见解,但大多数可用于核心基因组估计的工具对基因组完整性很敏感。因此,环境基因组方法揭示的巨大系统发育多样性的很大一部分仍被排除在这类分析之外。我们提出了mOTUpan,这是一种新颖的迭代贝叶斯方法,用于计算高度不同完整性范围的基因组集的核心基因组。通过计算每个基因簇在目标基因组集中存在/缺失模式的概率,估计其属于核心基因组或辅助基因组的可能性。核心基因组预测在计算上效率很高,并且可以扩展到数千个基因组。对于高质量基因组,它已显示出与最先进的工具Roary和PPanGGOLiN相当的估计结果,并且能够使用完整性阈值较低的基因组。mOTUpan包含一个自展程序来估计特定核心基因组预测的质量,因为每次运行的准确性将取决于特定的完整性分布和所审查数据集中的基因组数量。mOTUpan在mOTUlizer软件包中实现,可在github.com/moritzbuck/mOTUlizer上获取,遵循GPL 3.0许可协议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9358/9376867/386ed2f60d3b/lqac060fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验