Bonnici Vincenzo, Chicco Davide
Dipartimento di Scienze Matematiche Fisiche e Informatiche, Università di Parma, Parma, Italy.
Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy.
BioData Min. 2024 Sep 3;17(1):28. doi: 10.1186/s13040-024-00380-2.
Pangenomics is a relatively new scientific field which investigates the union of all the genomes of a clade. The word pan means everything in ancient Greek; the term pangenomics originally regarded genomes of bacteria and was later intended to refer to human genomes as well. Modern bioinformatics offers several tools to analyze pangenomics data, paving the way to an emerging field that we can call computational pangenomics. Current computational power available for the bioinformatics community has made computational pangenomic analyses easy to perform, but this higher accessibility to pangenomics analysis also increases the chances to make mistakes and to produce misleading or inflated results, especially by beginners. To handle this problem, we present here a few quick tips for efficient and correct computational pangenomic analyses with a focus on bacterial pangenomics, by describing common mistakes to avoid and experienced best practices to follow in this field. We believe our recommendations can help the readers perform more robust and sound pangenomic analyses and to generate more reliable results.
泛基因组学是一个相对较新的科学领域,它研究一个进化枝中所有基因组的总和。“pan”这个词在古希腊语中的意思是“一切”;“泛基因组学”这个术语最初指的是细菌基因组,后来也用于指代人类基因组。现代生物信息学提供了多种工具来分析泛基因组学数据,为一个新兴领域——我们可以称之为计算泛基因组学——铺平了道路。目前生物信息学界可用的计算能力使得计算泛基因组分析易于执行,但这种对泛基因组学分析更高的可及性也增加了出错以及产生误导性或夸大结果的可能性,尤其是对于初学者而言。为了解决这个问题,我们在此提供一些快速提示,以进行高效且正确的计算泛基因组分析,重点是细菌泛基因组学,通过描述该领域中要避免的常见错误以及应遵循的经验丰富的最佳实践。我们相信我们的建议可以帮助读者进行更稳健、合理的泛基因组分析,并产生更可靠的结果。