National Center for Biotechnology Information, U.S. National Library of Medicine 8600 Rockville Pike, Bethesda MD, 20894, USA.
Microb Genom. 2022 Jun;8(6). doi: 10.1099/mgen.0.000832.
Antimicrobial resistance (AMR) is a significant public health threat. Low-cost whole-genome sequencing, which is often used in surveillance programmes, provides an opportunity to assess AMR gene content in these genomes using approaches. A variety of bioinformatic tools have been developed to identify these genomic elements. Most of those tools rely on reference databases of nucleotide or protein sequences and collections of models and rules for analysis. While the tools are critical for the identification of AMR genes, the databases themselves also provide significant utility for researchers, for applications ranging from sequence analysis to information about AMR phenotypes. Additionally, these databases can be evaluated by domain experts and others to ensure their accuracy. Here we describe how we curate the genes, point mutations and blast rules, and hidden Markov models used in NCBI's AMRFinderPlus, along with the quality-control steps we take to ensure database quality. We also describe the web interfaces that display the full structure of the database and their newly developed cross-browser relationships. Then, using the Reference Gene Catalog as an example, we detail how the databases, rules and models are made publicly available, as well as how to access the software. In addition, as part of the Pathogen Detection system, we have analysed over 1 million publicly available genomes using AMRFinderPlus and its databases. We discuss how the computed analyses generated by those tools can be accessed through a web interface. Finally, we conclude with NCBI's plans to make these databases accessible over the long-term.
抗微生物药物耐药性 (AMR) 是一个重大的公共卫生威胁。低成本全基因组测序常用于监测项目,为使用多种方法评估这些基因组中的 AMR 基因含量提供了机会。已经开发了各种生物信息学工具来识别这些基因组元素。这些工具中的大多数都依赖于核苷酸或蛋白质序列的参考数据库以及用于分析的模型和规则集合。虽然这些工具对于识别 AMR 基因至关重要,但这些数据库本身也为研究人员提供了重要的应用价值,从序列分析到 AMR 表型信息的应用都有涉及。此外,这些数据库可以由领域专家和其他人员进行评估,以确保其准确性。在这里,我们描述了如何整理 NCBI 的 AMRFinderPlus 中使用的基因、点突变和 Blast 规则以及隐马尔可夫模型,以及我们采取的确保数据库质量的质量控制步骤。我们还描述了展示数据库完整结构及其新开发的跨浏览器关系的网络界面。然后,我们以参考基因目录为例,详细介绍了如何公开提供数据库、规则和模型,以及如何访问该软件。此外,作为病原体检测系统的一部分,我们使用 AMRFinderPlus 及其数据库分析了超过 100 万个公开可用的基因组。我们讨论了如何通过网络界面访问这些工具生成的计算分析。最后,我们总结了 NCBI 长期访问这些数据库的计划。