Spfy：用于实时预测细菌表型和下游比较分析的集成图数据库。

Spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses.

机构信息

National Microbiology Laboratory at Lethbridge, Public Health Agency of Canada, Lethbridge, Canada.

出版信息

Database (Oxford). 2018 Jan 1;2018:1-10. doi: 10.1093/database/bay086.

DOI:10.1093/database/bay086

PMID:30212910

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6146121/

Abstract

Public health laboratories are currently moving to whole-genome sequence (WGS)-based analyses, and require the rapid prediction of standard reference laboratory methods based solely on genomic data. Currently, these predictive genomics tasks rely on workflows that chain together multiple programs for the requisite analyses. While useful, these systems do not store the analyses in a genome-centric way, meaning the same analyses are often re-computed for the same genomes. To solve this problem, we created Spfy, a platform that rapidly performs the common reference laboratory tests, uses a graph database to store and retrieve the results from the computational workflows and links data to individual genomes using standardized ontologies. The Spfy platform facilitates rapid phenotype identification, as well as the efficient storage and downstream comparative analysis of tens of thousands of genome sequences. Though generally applicable to bacterial genome sequences, Spfy currently contains 10 243 Escherichia coli genomes, for which in-silico serotype and Shiga-toxin subtype, as well as the presence of known virulence factors and antimicrobial resistance determinants have been computed. Additionally, the presence/absence of the entire E. coli pan-genome was computed and linked to each genome. Owing to its database of diverse pre-computed results, and the ability to easily incorporate user data, Spfy facilitates hypothesis testing in fields ranging from population genomics to epidemiology, while mitigating the re-computation of analyses. The graph approach of Spfy is flexible, and can accommodate new analysis software modules as they are developed, easily linking new results to those already stored. Spfy provides a database and analyses approach for E. coli that is able to match the rapid accumulation of WGS data in public databases.

摘要

公共卫生实验室目前正转向基于全基因组序列（WGS）的分析，并且需要仅基于基因组数据快速预测标准参考实验室方法。目前，这些预测基因组学任务依赖于将多个程序链接在一起进行必要分析的工作流程。虽然这些系统很有用，但它们并没有以基因组为中心的方式存储分析结果，这意味着对于相同的基因组，通常会重新计算相同的分析。为了解决这个问题，我们创建了 Spfy，这是一个快速执行常见参考实验室测试的平台，它使用图形数据库存储和检索计算工作流程的结果，并使用标准化本体将数据链接到各个基因组。Spfy 平台促进了快速表型识别，以及对成千上万基因组序列的高效存储和下游比较分析。尽管 Spfy 通常适用于细菌基因组序列，但它目前包含 10243 个大肠杆菌基因组，其中已经计算了大肠杆菌的血清型和志贺毒素亚型、已知毒力因子和抗生素耐药决定因素的存在。此外，还计算了整个大肠杆菌泛基因组的存在/不存在，并将其与每个基因组相关联。由于其多样化的预计算结果数据库，以及易于整合用户数据的能力，Spfy 促进了从群体基因组学到流行病学等领域的假设检验，同时减轻了分析的重新计算。Spfy 的图形方法具有灵活性，可以随着新分析软件模块的开发而轻松适应，将新结果轻松链接到已存储的结果。Spfy 为大肠杆菌提供了一个数据库和分析方法，能够与公共数据库中 WGS 数据的快速积累相匹配。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/48df/6146121/5a8590e10b89/bay086f1.jpg

相似文献

Spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses.Spfy：用于实时预测细菌表型和下游比较分析的集成图数据库。

Database (Oxford). 2018 Jan 1;2018:1-10. doi: 10.1093/database/bay086.

SuperPhy: predictive genomics for the bacterial pathogen Escherichia coli.SuperPhy：用于细菌病原体大肠杆菌的预测基因组学。

BMC Microbiol. 2016 Apr 12;16:65. doi: 10.1186/s12866-016-0680-0.

coliBASE: an online database for Escherichia coli, Shigella and Salmonella comparative genomics.coliBASE：一个用于大肠杆菌、志贺氏菌和沙门氏菌比较基因组学的在线数据库。

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D296-9. doi: 10.1093/nar/gkh031.

PanTools: representation, storage and exploration of pan-genomic data.泛基因组工具：泛基因组数据的表示、存储与探索

Bioinformatics. 2016 Sep 1;32(17):i487-i493. doi: 10.1093/bioinformatics/btw455.

Virulence Gene Profiles and Clonal Relationships of Escherichia coli O26:H11 Isolates from Feedlot Cattle as Determined by Whole-Genome Sequencing.通过全基因组测序确定的育肥牛源大肠杆菌O26:H11分离株的毒力基因谱和克隆关系

Appl Environ Microbiol. 2016 Jun 13;82(13):3900-3912. doi: 10.1128/AEM.00498-16. Print 2016 Jul 1.

Comparative genomics of European avian pathogenic E. Coli (APEC).欧洲禽致病性大肠杆菌（APEC）的比较基因组学

BMC Genomics. 2016 Nov 22;17(1):960. doi: 10.1186/s12864-016-3289-7.

PanWeb: A web interface for pan-genomic analysis.PanWeb：用于泛基因组分析的网络界面。

PLoS One. 2017 May 24;12(5):e0178154. doi: 10.1371/journal.pone.0178154. eCollection 2017.

ECTyper: serotype and species prediction from raw and assembled whole-genome sequence data.ECTyper：从原始和组装的全基因组序列数据中预测血清型和物种。

Microb Genom. 2021 Dec;7(12). doi: 10.1099/mgen.0.000728.

FDA Escherichia coli Identification (FDA-ECID) Microarray: a Pangenome Molecular Toolbox for Serotyping, Virulence Profiling, Molecular Epidemiology, and Phylogeny.美国食品药品监督管理局大肠杆菌鉴定（FDA-ECID）微阵列：用于血清分型、毒力分析、分子流行病学和系统发育分析的泛基因组分子工具箱

Appl Environ Microbiol. 2016 May 16;82(11):3384-3394. doi: 10.1128/AEM.04077-15. Print 2016 Jun 1.

CloudMap: a cloud-based pipeline for analysis of mutant genome sequences.CloudMap：一种基于云的突变基因组序列分析流水线。

Genetics. 2012 Dec;192(4):1249-69. doi: 10.1534/genetics.112.144204. Epub 2012 Oct 10.

引用本文的文献

An overview of graph databases and their applications in the biomedical domain.图数据库及其在生物医学领域中的应用概述。

Database (Oxford). 2021 May 18;2021. doi: 10.1093/database/baab026.

Assessing the genomic relatedness and evolutionary rates of persistent verotoxigenic serotypes within a closed beef herd in Canada.评估加拿大一个封闭牛群中持续性产肠毒素血清型的基因组亲缘关系和进化率。

Microb Genom. 2020 Jun;6(6). doi: 10.1099/mgen.0.000376. Epub 2020 Jun 3.

Formal Medical Knowledge Representation Supports Deep Learning Algorithms, Bioinformatics Pipelines, Genomics Data Analysis, and Big Data Processes.形式化医学知识表示支持深度学习算法、生物信息学管道、基因组数据分析和大数据处理。

Yearb Med Inform. 2019 Aug;28(1):152-155. doi: 10.1055/s-0039-1677933. Epub 2019 Aug 16.

本文引用的文献

ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads.ARIBA：直接从测序读段进行快速抗菌药物耐药基因分型。

Microb Genom. 2017 Sep 4;3(10):e000131. doi: 10.1099/mgen.0.000131. eCollection 2017 Oct.

Phylotyper: in silico predictor of gene subtypes. phylotyper：基因亚型的计算机预测器。

Bioinformatics. 2017 Nov 15;33(22):3638-3641. doi: 10.1093/bioinformatics/btx459.

Context Is Everything: Harmonization of Critical Food Microbiology Descriptors and Metadata for Improved Food Safety and Surveillance.背景至关重要：协调关键食品微生物学描述符和元数据以改善食品安全与监测

Front Microbiol. 2017 Jun 26;8:1068. doi: 10.3389/fmicb.2017.01068. eCollection 2017.

Time to review the gold standard for genotyping vancomycin-resistant enterococci in epidemiology: Comparing whole-genome sequencing with PFGE and MLST in three suspected outbreaks in Sweden during 2013-2015.是时候回顾一下在流行病学中对耐万古霉素肠球菌进行基因分型的金标准了：比较2013 - 2015年瑞典三起疑似疫情中全基因组测序与脉冲场凝胶电泳（PFGE）和多位点序列分型（MLST）的结果。

Infect Genet Evol. 2017 Oct;54:74-80. doi: 10.1016/j.meegid.2017.06.010. Epub 2017 Jun 15.

serotyping of from short read data identifies limited novel O-loci but extensive diversity of O:H serotype combinations within and between pathogenic lineages.从短读数据进行血清型分析可鉴定出有限的新型 O 抗原基因座，但在病原谱系内和谱系间存在广泛的 O:H 血清型组合多样性。

Microb Genom. 2016 Jul 11;2(7):e000064. doi: 10.1099/mgen.0.000064. eCollection 2016 Jul.

Comparative Evaluation of Genomic and Laboratory Approaches for Determination of Shiga Toxin Subtypes in Escherichia coli.用于确定大肠杆菌中志贺毒素亚型的基因组学与实验室方法的比较评估

J Food Prot. 2016 Dec;79(12):2078-2085. doi: 10.4315/0362-028X.JFP-16-228.

Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center.全细菌生物信息学数据库与分析资源中心PATRIC的改进。

Nucleic Acids Res. 2017 Jan 4;45(D1):D535-D542. doi: 10.1093/nar/gkw1017. Epub 2016 Nov 29.

Navigating Microbiological Food Safety in the Era of Whole-Genome Sequencing.全基因组测序时代下的微生物食品安全应对策略

Clin Microbiol Rev. 2016 Oct;29(4):837-57. doi: 10.1128/CMR.00056-16.

A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance.一种细菌分析平台：用于临床诊断和监测的细菌全基因组测序数据分析集成系统。

PLoS One. 2016 Jun 21;11(6):e0157718. doi: 10.1371/journal.pone.0157718. eCollection 2016.

FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.FALDO：一种用于描述核苷酸和蛋白质特征注释位置的语义标准。

J Biomed Semantics. 2016 Jun 13;7:39. doi: 10.1186/s13326-016-0067-z.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

Spfy：用于实时预测细菌表型和下游比较分析的集成图数据库。

Spfy: an integrated graph database for real-time prediction of bacterial phenotypes and downstream comparative analyses.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献