Marin Maximillian G, Quinones-Olvera Natalia, Wippel Christoph, Behruznia Mahboobeh, Jeffrey Brendan M, Harris Michael, Mann Brendon C, Rosenthal Alex, Jacobson Karen R, Warren Robin M, Li Heng, Meehan Conor J, Farhat Maha R
Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States.
Department of Biosciences, Nottingham Trent University, Nottingham, NG1 4FQ, United Kingdom.
Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf219.
SUMMARY: Pan-genome analysis is a fundamental tool for studying bacterial genome evolution; however, the variety in methods used to define and measure the pan-genome poses challenges to the interpretation and reliability of results. Using Mycobacterium tuberculosis, a clonally evolving bacterium with a small accessory genome, as a model system, we systematically evaluated sources of variability in pan-genome estimates. Our analysis revealed that differences in assembly type (short-read versus hybrid), annotation pipeline, and pan-genome software, significantly impact predictions of core and accessory genome size. Extending our analysis to two additional bacterial species, Escherichia coli and Staphylococcus aureus, we observed consistent tool-dependent biases but species-specific patterns in pan-genome variability. Our findings highlight the importance of integrating nucleotide- and protein-level analyses to improve the reliability and reproducibility of pan-genome studies across diverse bacterial populations. AVAILABILITY AND IMPLEMENTATION: Panqc is freely available under an MIT license at https://github.com/maxgmarin/panqc.
摘要:泛基因组分析是研究细菌基因组进化的基本工具;然而,用于定义和测量泛基因组的方法多种多样,这给结果的解释和可靠性带来了挑战。我们以结核分枝杆菌(一种具有小的辅助基因组的克隆进化细菌)作为模型系统,系统地评估了泛基因组估计中变异性的来源。我们的分析表明,组装类型(短读长与混合组装)、注释流程和泛基因组软件的差异,会显著影响核心基因组和辅助基因组大小的预测。将我们的分析扩展到另外两种细菌物种,大肠杆菌和金黄色葡萄球菌,我们观察到了一致的工具依赖性偏差,但泛基因组变异性存在物种特异性模式。我们的研究结果强调了整合核苷酸水平和蛋白质水平分析对于提高跨不同细菌群体的泛基因组研究的可靠性和可重复性的重要性。 可用性和实现方式:Panqc在https://github.com/maxgmarin/panqc上根据MIT许可免费提供。
Bioinformatics. 2025-7-1
Cochrane Database Syst Rev. 2018-8-27
Cochrane Database Syst Rev. 2013-1-31
Cochrane Database Syst Rev. 2022-5-18
Nucleic Acids Res. 2025-6-20
bioRxiv. 2025-6-27
Bioinformatics. 2024-7-23
FEMS Microbiol Rev. 2024-3-1
Microbiol Spectr. 2024-2-6
Microb Genom. 2023-5
Bioinformatics. 2023-1-1
Proc Natl Acad Sci U S A. 2022-12-13
Microb Genom. 2022-10