Department of Physics, School of Science, Tianjin University, Tianjin 300072, China.
Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China.
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac283.
The pan-genome analysis of bacteria provides detailed insight into the diversity and evolution of a bacterial population. However, the genomes involved in the pan-genome analysis should be checked carefully, as the inclusion of confounding strains would have unfavorable effects on the identification of core genes, and the highly similar strains could bias the results of the pan-genome state (open versus closed). In this study, we found that the inclusion of highly similar strains also affects the results of unique genes in pan-genome analysis, which leads to a significant underestimation of the number of unique genes in the pan-genome. Therefore, these strains should be excluded from pan-genome analysis at the early stage of data processing. Currently, tens of thousands of genomes have been sequenced for Escherichia coli, which provides an unprecedented opportunity as well as a challenge for pan-genome analysis of this classical model organism. Using the proposed strategies, a high-quality E. coli pan-genome was obtained, and the unique genes was extracted and analyzed, revealing an association between the unique gene clusters and genomic islands from a pan-genome perspective, which may facilitate the identification of genomic islands.
对细菌的泛基因组分析可以深入了解细菌群体的多样性和进化。然而,在进行泛基因组分析时,应该仔细检查所涉及的基因组,因为包含混杂菌株会对核心基因的鉴定产生不利影响,并且高度相似的菌株可能会使泛基因组状态(开放与关闭)的结果产生偏差。在这项研究中,我们发现包含高度相似的菌株也会影响泛基因组分析中独特基因的结果,这导致泛基因组中独特基因数量的显著低估。因此,在数据处理的早期阶段,这些菌株应从泛基因组分析中排除。目前,已经对大肠杆菌进行了数万次基因组测序,这为这个经典模式生物的泛基因组分析提供了前所未有的机会和挑战。使用所提出的策略,获得了高质量的大肠杆菌泛基因组,并提取和分析了独特基因,从泛基因组的角度揭示了独特基因簇与基因组岛之间的关联,这可能有助于基因组岛的鉴定。