Genomics and Bioinformatics Unit, Department of Infectious Diseases, National Institute of Health Doutor Ricardo Jorge (INSA), Lisbon, Portugal.
National Reference Centre (NRC) for Whole Genome Sequencing of Microbial Pathogens: Database and Bioinformatics analysis (GENPAT), Istituto Zooprofilattico Sperimentale Dell'Abruzzo E del Molise "Giuseppe Caporale" (IZSAM), Teramo, Italy.
Genome Med. 2023 Jun 15;15(1):43. doi: 10.1186/s13073-023-01196-1.
BACKGROUND: Genomics-informed pathogen surveillance strengthens public health decision-making, playing an important role in infectious diseases' prevention and control. A pivotal outcome of genomics surveillance is the identification of pathogen genetic clusters and their characterization in terms of geotemporal spread or linkage to clinical and demographic data. This task often consists of the visual exploration of (large) phylogenetic trees and associated metadata, being time-consuming and difficult to reproduce. RESULTS: We developed ReporTree, a flexible bioinformatics pipeline that allows diving into the complexity of pathogen diversity to rapidly identify genetic clusters at any (or all) distance threshold(s) or cluster stability regions and to generate surveillance-oriented reports based on the available metadata, such as timespan, geography, or vaccination/clinical status. ReporTree is able to maintain cluster nomenclature in subsequent analyses and to generate a nomenclature code combining cluster information at different hierarchical levels, thus facilitating the active surveillance of clusters of interest. By handling several input formats and clustering methods, ReporTree is applicable to multiple pathogens, constituting a flexible resource that can be smoothly deployed in routine surveillance bioinformatics workflows with negligible computational and time costs. This is demonstrated through a comprehensive benchmarking of (i) the cg/wgMLST workflow with large datasets of four foodborne bacterial pathogens and (ii) the alignment-based SNP workflow with a large dataset of Mycobacterium tuberculosis. To further validate this tool, we reproduced a previous large-scale study on Neisseria gonorrhoeae, demonstrating how ReporTree is able to rapidly identify the main species genogroups and characterize them with key surveillance metadata, such as antibiotic resistance data. By providing examples for SARS-CoV-2 and the foodborne bacterial pathogen Listeria monocytogenes, we show how this tool is currently a useful asset in genomics-informed routine surveillance and outbreak detection of a wide variety of species. CONCLUSIONS: In summary, ReporTree is a pan-pathogen tool for automated and reproducible identification and characterization of genetic clusters that contributes to a sustainable and efficient public health genomics-informed pathogen surveillance. ReporTree is implemented in python 3.8 and is freely available at https://github.com/insapathogenomics/ReporTree .
背景:基于基因组学的病原体监测增强了公共卫生决策,在传染病的预防和控制中发挥了重要作用。基因组监测的一个关键结果是识别病原体遗传群集,并根据地理时空传播或与临床和人口统计学数据的关联对其进行特征描述。这项任务通常包括(大型)系统发育树和相关元数据的可视化探索,既耗时又难以重现。
结果:我们开发了 ReporTree,这是一个灵活的生物信息学管道,可以深入了解病原体多样性的复杂性,快速识别任何(或所有)距离阈值或聚类稳定性区域的遗传聚类,并根据可用元数据(如时间跨度、地理位置或接种/临床状况)生成面向监测的报告。ReporTree 能够在后续分析中保持聚类命名法,并生成一个命名法代码,该代码结合了不同层次水平的聚类信息,从而便于对感兴趣的聚类进行主动监测。通过处理多种输入格式和聚类方法,ReporTree 适用于多种病原体,是一种灵活的资源,可以在常规监测生物信息学工作流程中平稳部署,计算和时间成本可以忽略不计。这通过对四个食源性细菌病原体的大型数据集的 cg/wgMLST 工作流程(i)和大规模 Mycobacterium tuberculosis 数据集的基于比对的 SNP 工作流程(ii)的全面基准测试得到了证明。为了进一步验证该工具,我们重现了之前关于淋病奈瑟菌的大规模研究,展示了 ReporTree 如何能够快速识别主要的物种基因组群,并使用关键监测元数据(如抗生素耐药数据)对其进行特征描述。通过为 SARS-CoV-2 和食源性细菌病原体李斯特菌提供示例,我们展示了该工具如何在各种物种的基于基因组学的常规监测和暴发检测中成为有用的资产。
结论:总之,ReporTree 是一种用于自动和可重复识别和特征描述遗传聚类的泛病原体工具,有助于实现可持续和高效的公共卫生基因组学病原体监测。ReporTree 是用 python 3.8 实现的,可以在 https://github.com/insapathogenomics/ReporTree 上免费获取。
Elife. 2022-11-15
Cochrane Database Syst Rev. 2022-2-1
Bioinformatics. 2023-9-2
BMC Genomics. 2022-10-19
Microbiol Spectr. 2025-7
Front Cell Infect Microbiol. 2025-1-31
Microb Genom. 2025-1
Front Microbiol. 2024-10-31
Antibiotics (Basel). 2024-5-14
Elife. 2022-11-15
Front Microbiol. 2021-5-28
Bioinformatics. 2021-10-25
Nat Microbiol. 2020-7-15
Nat Methods. 2020-2-3