• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

STRAIN:一个用于从全基因组测序数据进行多位点序列分型的 R 包。

STRAIN: an R package for multi-locus sequence typing from whole genome sequencing data.

机构信息

GSK, Siena, Italy.

Present address: Department of Experimental Oncology, European Institute of Oncology, Milan, Italy.

出版信息

BMC Bioinformatics. 2019 Nov 22;20(Suppl 9):347. doi: 10.1186/s12859-019-2887-1.

DOI:10.1186/s12859-019-2887-1
PMID:31757201
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6873635/
Abstract

BACKGROUND

Multi-locus sequence typing (MLST) is a standard typing technique used to associate a sequence type (ST) to a bacterial isolate. When the output of whole genome sequencing (WGS) of a sample is available the ST can be assigned directly processing the read-set. Current approaches employ reads mapping (SRST2) against the MLST loci, k-mer distribution (stringMLST), selective assembly (GRAbB) or whole genome assembly (BIGSdb) followed by BLASTn sequence query. Here we present STRAIN (ST Reduced Assembly IdentificatioN), an R package that implements a hybrid strategy between assembly and mapping of the reads to assign the ST to an isolate starting from its read-sets.

RESULTS

Analysis of 540 publicly accessible Illumina read sets showed STRAIN to be more accurate at correct allele assignment and new alleles identification compared to SRTS2, stringMLST and GRAbB. STRAIN assigned correctly 3666 out of 3780 alleles (capability to identify correct alleles 97%) and, when presented with samples containing new alleles, identified them in 3730 out of 3780 STs (capability to identify new alleles 98.7%) of the cases. On the same dataset the other tested tools achieved lower capability to identify correct alleles (from 28.5 to 96.9%) and lower capability to identify new alleles (from 1.1 to 97.1%).

CONCLUSIONS

STRAIN is a new accurate method to assign the alleles and ST to an isolate by processing the raw reads output of WGS. STRAIN is also able to retrieve new allele sequences if present. Capability to identify correct and new STs/alleles, evaluated on a benchmark dataset, are higher than other existing methods. STRAIN is designed for single allele typing as well as MLST. Its implementation in R makes allele and ST assignment simple, direct and prompt to be integrated in wider pipeline of downstream bioinformatics analyses.

摘要

背景

多位点序列分型(MLST)是一种将序列型(ST)与细菌分离株相关联的标准分型技术。当可用样本的全基因组测序(WGS)的输出时,可以直接处理读段来分配 ST。当前的方法采用读取映射(SRST2)针对 MLST 基因座、k- -mer 分布(stringMLST)、选择性组装(GRAbB)或全基因组组装(BIGSdb),然后进行 BLASTn 序列查询。在这里,我们提出了 STRAIN(ST 简化组装鉴定),这是一个 R 包,它实现了一种混合策略,即在从其读段开始将 ST 分配给分离株时,对读取进行组装和映射。

结果

对 540 个公开可用的 Illumina 读取集的分析表明,与 SRTS2、stringMLST 和 GRAbB 相比,STRAIN 在正确分配等位基因和识别新等位基因方面更准确。STRAIN 正确识别了 3666 个 3780 个等位基因中的 3780 个(识别正确等位基因的能力为 97%),并且在遇到包含新等位基因的样本时,在 3780 个 ST 中的 3730 个中识别了它们(识别新等位基因的能力为 98.7%)。在同一数据集上,其他测试工具的正确识别等位基因的能力较低(从 28.5%到 96.9%),识别新等位基因的能力较低(从 1.1%到 97.1%)。

结论

STRAIN 是一种通过处理 WGS 的原始读取输出来分配等位基因和 ST 的新的准确方法。如果存在,STRAIN 还能够检索新的等位基因序列。在基准数据集上评估的正确和新 ST/等位基因的识别能力高于其他现有方法。STRAIN 旨在用于单一位点分型和 MLST。它在 R 中的实现使等位基因和 ST 的分配变得简单、直接,并可以快速集成到更广泛的下游生物信息学分析管道中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fd3/6873635/ad8202c2b65e/12859_2019_2887_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fd3/6873635/0a0ed0f09e2b/12859_2019_2887_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fd3/6873635/ad8202c2b65e/12859_2019_2887_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fd3/6873635/0a0ed0f09e2b/12859_2019_2887_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7fd3/6873635/ad8202c2b65e/12859_2019_2887_Fig2_HTML.jpg

相似文献

1
STRAIN: an R package for multi-locus sequence typing from whole genome sequencing data.STRAIN:一个用于从全基因组测序数据进行多位点序列分型的 R 包。
BMC Bioinformatics. 2019 Nov 22;20(Suppl 9):347. doi: 10.1186/s12859-019-2887-1.
2
Short read sequence typing (SRST): multi-locus sequence types from short reads.短读序列分型(SRST):来自短读的多位点序列型。
BMC Genomics. 2012 Jul 24;13:338. doi: 10.1186/1471-2164-13-338.
3
Gen2Epi: an automated whole-genome sequencing pipeline for linking full genomes to antimicrobial susceptibility and molecular epidemiological data in Neisseria gonorrhoeae.Gen2Epi:淋病奈瑟菌全基因组测序与抗生素药敏及分子流行病学数据关联的自动化分析流程。
BMC Genomics. 2019 Mar 4;20(1):165. doi: 10.1186/s12864-019-5542-3.
4
stringMLST: a fast k-mer based tool for multilocus sequence typing.字符串多位点序列分型(stringMLST):一种基于快速k-mer的多位点序列分型工具。
Bioinformatics. 2017 Jan 1;33(1):119-121. doi: 10.1093/bioinformatics/btw586. Epub 2016 Sep 7.
5
An Open-Source Program (Haplo-ST) for Whole-Genome Sequence Typing Shows Extensive Diversity among Listeria monocytogenes Isolates in Outdoor Environments and Poultry Processing Plants.开源程序(Haplo-ST)用于全基因组序列分型,显示户外环境和家禽加工厂中单核细胞增生李斯特菌分离株具有广泛的多样性。
Appl Environ Microbiol. 2020 Dec 17;87(1). doi: 10.1128/AEM.02248-20.
6
Surveillance of invasive meningococcal disease based on whole genome sequencing (WGS), Czech Republic, 2015.2015年捷克共和国基于全基因组测序(WGS)的侵袭性脑膜炎球菌病监测
Epidemiol Mikrobiol Imunol. 2018 Spring;67(2):64-73.
7
BacTag - a pipeline for fast and accurate gene and allele typing in bacterial sequencing data based on database preprocessing.BacTag - 一种基于数据库预处理的快速准确的细菌测序数据中基因和等位基因分型的流水线。
BMC Genomics. 2019 May 6;20(1):338. doi: 10.1186/s12864-019-5723-0.
8
Core Genome Multi-locus Sequence Typing Analyses of Leptospira spp. Using the Bacterial Isolate Genome Sequence Database.核心基因组多位点序列分型分析使用细菌分离基因组序列数据库的螺旋体属。
Methods Mol Biol. 2020;2134:11-21. doi: 10.1007/978-1-0716-0459-5_2.
9
MentaLiST - A fast MLST caller for large MLST schemes.MentaLiST - 一种适用于大型 MLST 方案的快速 MLST 调用程序。
Microb Genom. 2018 Feb;4(2). doi: 10.1099/mgen.0.000146. Epub 2018 Jan 10.
10
A multilocus sequence typing scheme for complex (MAB-multilocus sequence typing) using whole-genome sequencing data.一种使用全基因组测序数据的复杂分枝杆菌多位点序列分型方案(MAB - 多位点序列分型)
Int J Mycobacteriol. 2019 Jul-Sep;8(3):273-280. doi: 10.4103/ijmy.ijmy_106_19.

本文引用的文献

1
stringMLST: a fast k-mer based tool for multilocus sequence typing.字符串多位点序列分型(stringMLST):一种基于快速k-mer的多位点序列分型工具。
Bioinformatics. 2017 Jan 1;33(1):119-121. doi: 10.1093/bioinformatics/btw586. Epub 2016 Sep 7.
2
GRAbB: Selective Assembly of Genomic Regions, a New Niche for Genomic Research.GRAbB:基因组区域的选择性组装,基因组研究的一个新领域
PLoS Comput Biol. 2016 Jun 16;12(6):e1004753. doi: 10.1371/journal.pcbi.1004753. eCollection 2016 Jun.
3
SRST2: Rapid genomic surveillance for public health and hospital microbiology labs.
SRST2:用于公共卫生和医院微生物实验室的快速基因组监测。
Genome Med. 2014 Nov 20;6(11):90. doi: 10.1186/s13073-014-0090-6. eCollection 2014.
4
Trimmomatic: a flexible trimmer for Illumina sequence data.Trimmomatic:一款适用于 Illumina 测序数据的灵活修剪工具。
Bioinformatics. 2014 Aug 1;30(15):2114-20. doi: 10.1093/bioinformatics/btu170. Epub 2014 Apr 1.
5
MLST revisited: the gene-by-gene approach to bacterial genomics.重新审视 MLST:基于基因的细菌基因组学研究方法。
Nat Rev Microbiol. 2013 Oct;11(10):728-36. doi: 10.1038/nrmicro3093. Epub 2013 Sep 2.
6
SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.SPAdes:一种新的基因组组装算法及其在单细胞测序中的应用
J Comput Biol. 2012 May;19(5):455-77. doi: 10.1089/cmb.2012.0021. Epub 2012 Apr 16.
7
Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。
Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.
8
BIGSdb: Scalable analysis of bacterial genome variation at the population level.BIGSdb:在群体水平上对细菌基因组变异进行可扩展的分析。
BMC Bioinformatics. 2010 Dec 10;11:595. doi: 10.1186/1471-2105-11-595.
9
BEDTools: a flexible suite of utilities for comparing genomic features.BEDTools:一套灵活的基因组特征比较工具套件。
Bioinformatics. 2010 Mar 15;26(6):841-2. doi: 10.1093/bioinformatics/btq033. Epub 2010 Jan 28.
10
The Sequence Alignment/Map format and SAMtools.序列比对/映射格式和 SAMtools。
Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.