• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

全基因组测序和大数据时代的实时病原体检测:用于推断数以万计沙门氏菌样本间遗传距离的k-mer法与基于位点方法的比较

Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples.

作者信息

Pettengill James B, Pightling Arthur W, Baugher Joseph D, Rand Hugh, Strain Errol

机构信息

Biostatistics and Bioinformatics Staff, Center for Food Safety and Applied Nutrition, Food and Drug Administration, 5001 Campus Drive, College Park, MD 20740, United States of America.

出版信息

PLoS One. 2016 Nov 10;11(11):e0166162. doi: 10.1371/journal.pone.0166162. eCollection 2016.

DOI:10.1371/journal.pone.0166162
PMID:27832109
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5104361/
Abstract

The adoption of whole-genome sequencing within the public health realm for molecular characterization of bacterial pathogens has been followed by an increased emphasis on real-time detection of emerging outbreaks (e.g., food-borne Salmonellosis). In turn, large databases of whole-genome sequence data are being populated. These databases currently contain tens of thousands of samples and are expected to grow to hundreds of thousands within a few years. For these databases to be of optimal use one must be able to quickly interrogate them to accurately determine the genetic distances among a set of samples. Being able to do so is challenging due to both biological (evolutionary diverse samples) and computational (petabytes of sequence data) issues. We evaluated seven measures of genetic distance, which were estimated from either k-mer profiles (Jaccard, Euclidean, Manhattan, Mash Jaccard, and Mash distances) or nucleotide sites (NUCmer and an extended multi-locus sequence typing (MLST) scheme). When analyzing empirical data (whole-genome sequence data from 18,997 Salmonella isolates) there are features (e.g., genomic, assembly, and contamination) that cause distances inferred from k-mer profiles, which treat absent data as informative, to fail to accurately capture the distance between samples when compared to distances inferred from differences in nucleotide sites. Thus, site-based distances, like NUCmer and extended MLST, are superior in performance, but accessing the computing resources necessary to perform them may be challenging when analyzing large databases.

摘要

在公共卫生领域采用全基因组测序对细菌病原体进行分子特征分析之后,人们越来越重视对新出现的疫情(如食源性沙门氏菌病)进行实时检测。相应地,全基因组序列数据的大型数据库正在不断充实。这些数据库目前包含数万个样本,预计在几年内将增长到数十万。为了使这些数据库得到最佳利用,必须能够快速查询它们,以准确确定一组样本之间的遗传距离。由于生物学(进化多样的样本)和计算(数PB的序列数据)问题,能够做到这一点具有挑战性。我们评估了七种遗传距离度量方法,这些方法是根据k-mer图谱(杰卡德距离、欧几里得距离、曼哈顿距离、Mash杰卡德距离和Mash距离)或核苷酸位点(NUCmer和扩展的多位点序列分型(MLST)方案)估算出来的。在分析经验数据(来自18997株沙门氏菌分离株的全基因组序列数据)时,存在一些特征(如基因组、组装和污染),这些特征导致从k-mer图谱推断出的距离(将缺失数据视为有信息的)在与从核苷酸位点差异推断出的距离相比时,无法准确捕捉样本之间的距离。因此,基于位点的距离,如NUCmer和扩展的MLST,在性能上更优,但在分析大型数据库时,获取执行这些方法所需的计算资源可能具有挑战性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4260/5104361/8416e564216b/pone.0166162.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4260/5104361/ef9f091d5771/pone.0166162.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4260/5104361/8416e564216b/pone.0166162.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4260/5104361/ef9f091d5771/pone.0166162.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4260/5104361/8416e564216b/pone.0166162.g002.jpg

相似文献

1
Real-Time Pathogen Detection in the Era of Whole-Genome Sequencing and Big Data: Comparison of k-mer and Site-Based Methods for Inferring the Genetic Distances among Tens of Thousands of Salmonella Samples.全基因组测序和大数据时代的实时病原体检测:用于推断数以万计沙门氏菌样本间遗传距离的k-mer法与基于位点方法的比较
PLoS One. 2016 Nov 10;11(11):e0166162. doi: 10.1371/journal.pone.0166162. eCollection 2016.
2
Multilocus sequence typing analysis and second-generation sequencing analysis of Salmonella Wandsworth.肠炎沙门氏菌的多位点序列分型分析和第二代测序分析。
J Clin Lab Anal. 2021 Sep;35(9):e23901. doi: 10.1002/jcla.23901. Epub 2021 Jul 10.
3
MentaLiST - A fast MLST caller for large MLST schemes.MentaLiST - 一种适用于大型 MLST 方案的快速 MLST 调用程序。
Microb Genom. 2018 Feb;4(2). doi: 10.1099/mgen.0.000146. Epub 2018 Jan 10.
4
Virulence and resistance genes profiles and clonal relationships of non-typhoidal food-borne Salmonella strains isolated in Tunisia by whole genome sequencing.利用全基因组测序技术分析突尼斯地区食源性非伤寒沙门氏菌菌株的毒力和耐药基因谱及克隆关系。
Int J Food Microbiol. 2021 Jan 16;337:108941. doi: 10.1016/j.ijfoodmicro.2020.108941. Epub 2020 Oct 28.
5
Multilocus sequence typing of Salmonella strains by high-throughput sequencing of selectively amplified target genes.高通量测序选择性扩增靶基因对沙门氏菌菌株进行多位点序列分型。
J Microbiol Methods. 2012 Jan;88(1):127-33. doi: 10.1016/j.mimet.2011.11.004. Epub 2011 Nov 11.
6
Comprehensive assessment of the quality of Salmonella whole genome sequence data available in public sequence databases using the Salmonella in silico Typing Resource (SISTR).利用沙门氏菌计算机分型资源(SISTR)全面评估公共序列数据库中可用的沙门氏菌全基因组序列数据的质量。
Microb Genom. 2018 Feb;4(2). doi: 10.1099/mgen.0.000151. Epub 2018 Jan 17.
7
Molecular methods for serovar determination of Salmonella.用于沙门氏菌血清型测定的分子方法。
Crit Rev Microbiol. 2015;41(3):309-25. doi: 10.3109/1040841X.2013.837862. Epub 2013 Nov 14.
8
The Salmonella In Silico Typing Resource (SISTR): An Open Web-Accessible Tool for Rapidly Typing and Subtyping Draft Salmonella Genome Assemblies.沙门氏菌电子分型资源(SISTR):一种可通过网络公开访问的工具,用于快速对沙门氏菌基因组草图进行分型和亚型分析。
PLoS One. 2016 Jan 22;11(1):e0147101. doi: 10.1371/journal.pone.0147101. eCollection 2016.
9
Massively parallel sequencing of enriched target amplicons for high-resolution genotyping of Salmonella serovars.富集靶标扩增子的大规模平行测序用于沙门氏菌血清型的高分辨率基因分型。
Mol Cell Probes. 2013 Apr;27(2):80-5. doi: 10.1016/j.mcp.2012.11.004. Epub 2012 Nov 29.
10
Characterization of new Salmonella serovars by whole-genome sequencing and traditional typing techniques.通过全基因组测序和传统分型技术对新型沙门氏菌血清型进行鉴定。
J Med Microbiol. 2016 Oct;65(10):1074-1078. doi: 10.1099/jmm.0.000325. Epub 2016 Aug 1.

引用本文的文献

1
Assessment of plasmids for relating the 2020 Salmonella enterica serovar Newport onion outbreak to farms implicated by the outbreak investigation.评估质粒,以将 2020 年沙门氏菌纽波特洋葱血清暴发与暴发调查中涉及的农场联系起来。
BMC Genomics. 2023 Apr 4;24(1):165. doi: 10.1186/s12864-023-09245-0.
2
Polyphyly in widespread serovars and using genomic proximity to choose the best reference genome for bioinformatics analyses.广泛血清型中的多系发生和使用基因组邻近性来选择最佳参考基因组进行生物信息学分析。
Front Public Health. 2022 Sep 8;10:963188. doi: 10.3389/fpubh.2022.963188. eCollection 2022.
3
Evaluation of various distance computation methods for construction of haplotype-based phylogenies from large MLST datasets.

本文引用的文献

1
Real-time digital pathogen surveillance - the time is now.实时数字病原体监测——当下正当时。
Genome Biol. 2015 Jul 30;16(1):155. doi: 10.1186/s13059-015-0726-x.
2
Mash: fast genome and metagenome distance estimation using MinHash.Mash:使用MinHash进行快速的基因组和宏基因组距离估计。
Genome Biol. 2016 Jun 20;17(1):132. doi: 10.1186/s13059-016-0997-x.
3
Genomic Epidemiology: Whole-Genome-Sequencing-Powered Surveillance and Outbreak Investigation of Foodborne Bacterial Pathogens.基因组流行病学:基于全基因组测序的食源性病原体监测和暴发调查。
基于多位点序列分型大数据集构建单体型系统发育树的各种距离计算方法的评估。
Mol Phylogenet Evol. 2022 Dec;177:107608. doi: 10.1016/j.ympev.2022.107608. Epub 2022 Aug 11.
4
Using Evolutionary Analyses to Refine Whole-Genome Sequence Match Criteria.利用进化分析优化全基因组序列匹配标准。
Front Microbiol. 2022 Jun 16;13:797997. doi: 10.3389/fmicb.2022.797997. eCollection 2022.
5
K-mer based prediction of relatedness and ribotypes.基于 K- -mer 的亲缘关系和核糖体分型预测。
Microb Genom. 2022 Apr;8(4). doi: 10.1099/mgen.0.000804.
6
Evaluating the accuracy of Listeria monocytogenes assemblies from quasimetagenomic samples using long and short reads.评估使用长读长和短读长的准宏基因组样本中单核细胞增生李斯特菌组装的准确性。
BMC Genomics. 2021 May 26;22(1):389. doi: 10.1186/s12864-021-07702-2.
7
Within-species contamination of bacterial whole-genome sequence data has a greater influence on clustering analyses than between-species contamination.种内细菌全基因组序列数据污染对聚类分析的影响大于种间污染。
Genome Biol. 2019 Dec 18;20(1):286. doi: 10.1186/s13059-019-1914-x.
8
Phylogenetic Concepts and Tools Applied to Epidemiologic Investigations of Infectious Diseases.应用于传染病流行病学研究的系统发育概念和工具。
Microbiol Spectr. 2019 Jul;7(4). doi: 10.1128/microbiolspec.AME-0006-2018.
9
Whole genome sequencing for investigations of meningococcal outbreaks in the United States: a retrospective analysis.美国脑膜炎球菌暴发调查中的全基因组测序:回顾性分析。
Sci Rep. 2018 Oct 25;8(1):15803. doi: 10.1038/s41598-018-33622-5.
10
Pan-genome Analyses of the Species , and Identification of Genomic Markers Predictive for Species, Subspecies, and Serovar.该物种的泛基因组分析以及预测物种、亚种和血清型的基因组标记的鉴定。
Front Microbiol. 2017 Jul 31;8:1345. doi: 10.3389/fmicb.2017.01345. eCollection 2017.
Annu Rev Food Sci Technol. 2016;7:353-74. doi: 10.1146/annurev-food-041715-033259. Epub 2016 Jan 11.
4
The Listeria monocytogenes Core-Genome Sequence Typer (LmCGST): a bioinformatic pipeline for molecular characterization with next-generation sequence data.单核细胞增生李斯特菌核心基因组序列分型工具(LmCGST):一种利用下一代测序数据进行分子特征分析的生物信息学流程。
BMC Microbiol. 2015 Oct 22;15:224. doi: 10.1186/s12866-015-0526-1.
5
The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes.用于数千个种内微生物基因组的快速核心基因组比对和可视化的Harvest套件。
Genome Biol. 2014;15(11):524. doi: 10.1186/s13059-014-0524-x.
6
Rapid whole-genome sequencing for surveillance of Salmonella enterica serovar enteritidis.用于肠炎沙门氏菌肠炎血清型监测的快速全基因组测序
Emerg Infect Dis. 2014 Aug;20(8):1306-14. doi: 10.3201/eid2008.131399.
7
Prokka: rapid prokaryotic genome annotation.Prokka:快速的原核生物基因组注释。
Bioinformatics. 2014 Jul 15;30(14):2068-9. doi: 10.1093/bioinformatics/btu153. Epub 2014 Mar 18.
8
Kraken: ultrafast metagenomic sequence classification using exact alignments.克拉肯:使用精确比对的超快速宏基因组序列分类
Genome Biol. 2014 Mar 3;15(3):R46. doi: 10.1186/gb-2014-15-3-r46.
9
When whole-genome alignments just won't work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes.当全基因组比对无法奏效时:用于数百个微生物基因组的无比对单核苷酸多态性(SNP)发现及系统发育分析的kSNP v2软件
PLoS One. 2013 Dec 9;8(12):e81760. doi: 10.1371/journal.pone.0081760. eCollection 2013.
10
Phylogenetic diversity of the enteric pathogen Salmonella enterica subsp. enterica inferred from genome-wide reference-free SNP characters.基于全基因组无参考单核苷酸多态性特征推断的肠道病原体肠炎沙门氏菌亚种肠炎的系统发育多样性
Genome Biol Evol. 2013;5(11):2109-23. doi: 10.1093/gbe/evt159.