Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, Biopolis, Greece.
Department of Biochemistry, University of Cambridge, Cambridge, UK.
Methods Mol Biol. 2024;2788:139-155. doi: 10.1007/978-1-0716-3782-1_8.
This computational protocol describes how to use pyPGCF, a python software package that runs in the linux environment, in order to analyze bacterial genomes and perform: (i) phylogenomic analysis, (ii) species demarcation, (iii) identification of the core proteins of a bacterial genus and its individual species, (iv) identification of species-specific fingerprint proteins that are found in all strains of a species and, at the same time, are absent from all other species of the genus, (v) functional annotation of the core and fingerprint proteins with eggNOG, and (vi) identification of secondary metabolite biosynthetic gene clusters (smBGCs) with antiSMASH. This software has already been implemented to analyze bacterial genera and species that are important for plants (e.g., Pseudomonas, Bacillus, Streptomyces). In addition, we provide a test dataset and example commands showing how to analyze 165 genomes from 55 species of the genus Bacillus. The main advantages of pyPGCF are that: (i) it uses adjustable orthology cut-offs, (ii) it identifies species-specific fingerprints, and (iii) its computational cost scales linearly with the number of genomes being analyzed. Therefore, pyPGCF is able to deal with a very large number of bacterial genomes, in reasonable timescales, using widely available levels of computing power.
本计算方案描述了如何使用 pyPGCF,这是一个在 Linux 环境下运行的 Python 软件包,用于分析细菌基因组并执行以下操作:(i)系统发育基因组分析,(ii)物种划分,(iii)鉴定细菌属的核心蛋白及其各个物种,(iv)鉴定在该物种的所有菌株中都存在且同时不存在于该属的所有其他物种中的物种特异性指纹蛋白,(v)用 eggNOG 对核心和指纹蛋白进行功能注释,以及(vi)用 antiSMASH 鉴定次生代谢物生物合成基因簇(smBGCs)。该软件已经被用于分析对植物重要的细菌属和种(例如,假单胞菌、芽孢杆菌、链霉菌)。此外,我们提供了一个测试数据集和示例命令,展示了如何分析来自 55 种芽孢杆菌属的 165 个基因组。pyPGCF 的主要优点是:(i)它使用可调整的同源性截止值,(ii)它鉴定物种特异性指纹,以及(iii)其计算成本与被分析的基因组数量呈线性比例。因此,pyPGCF 能够在合理的时间内,使用广泛可用的计算能力处理大量的细菌基因组。