Luo Yan, Jang Jae Hee, Balkey Maria, Hoffmann Maria
U.S. Food and Drug Administration, Human Foods Program, College Park, MD, USA.
U.S. Department of Agriculture, Agriculture Research Service, Beltsville, MD, USA.
BMC Genom Data. 2025 Feb 28;26(1):15. doi: 10.1186/s12863-025-01304-7.
Whole Genome Sequencing (WGS) is widely used in food safety for the detection, investigation, and control of foodborne bacterial pathogens. However, the WGS data in most public databases, such as the National Center for Biotechnology Information (NCBI), primarily consist of Illumina short reads which lack some important information for repetitive regions, structural variations, and mobile genetic elements, and the genomic location of certain important genes like antimicrobial resistance genes (AMR) and virulence genes. To address this limitation, we have contributed 217 closed circular Salmonella enterica genomes that were generated using PacBio sequencing to the NCBI Pathogen Detection (PD) database and GenBank. This dataset provides a higher level of accuracy to genome representations in the database.
High-quality complete reference genomes generated from PacBio long reads can provide essential details that are not available in draft genomes from short reads. A complete reference genome allows for more accurate data analysis and researchers to establish connections between genome variations and known genes, regulatory elements, and other genomic features. The addition of 217 complete genomes from 78 different Salmonella serovars, each representing either a distinct SNP cluster within the NCBI PD database or a unique strain, significantly enriches the diversity of the reference genome database.
全基因组测序(WGS)在食品安全领域被广泛用于食源性病原体的检测、调查和控制。然而,大多数公共数据库(如美国国家生物技术信息中心(NCBI))中的WGS数据主要由Illumina短读长组成,这些短读长缺乏有关重复区域、结构变异和移动遗传元件的一些重要信息,以及某些重要基因(如抗菌抗性基因(AMR)和毒力基因)的基因组位置。为解决这一局限性,我们已将通过PacBio测序生成的217个闭合环状肠炎沙门氏菌基因组提交至NCBI病原体检测(PD)数据库和GenBank。该数据集提高了数据库中基因组表示的准确性。
由PacBio长读长生成的高质量完整参考基因组可提供短读长草图基因组中没有的重要细节。完整的参考基因组有助于进行更准确的数据分析,并使研究人员能够在基因组变异与已知基因、调控元件及其他基因组特征之间建立联系。来自78种不同沙门氏菌血清型的217个完整基因组的加入,每个基因组代表NCBI PD数据库中的一个独特单核苷酸多态性(SNP)簇或一个独特菌株,显著丰富了参考基因组数据库的多样性。