FastMLST：一种用于草图基因组组装多位点序列分型的多核工具。

FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies.

作者信息

Guerrero-Araya Enzo, Muñoz Marina, Rodríguez César, Paredes-Sabja Daniel

机构信息

Microbiota-Host Interactions and Clostridia Research Group, Facultad de Ciencias de la Vida, Universidad Andrés Bello, Santiago, Chile.

ANID, Millennium Science Initiative Program, Millennium Nucleus in the Biology of the Intestinal Microbiota, Santiago, Chile.

出版信息

Bioinform Biol Insights. 2021 Nov 27;15:11779322211059238. doi: 10.1177/11779322211059238. eCollection 2021.

DOI:10.1177/11779322211059238

PMID:34866905

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8637782/

Abstract

Multilocus Sequence Typing (MLST) is a precise microbial typing approach at the intra-species level for epidemiologic and evolutionary purposes. It operates by assigning a sequence type (ST) identifier to each specimen, based on a combination of alleles of multiple housekeeping genes included in a defined scheme. The use of MLST has multiplied due to the availability of large numbers of genomic sequences and epidemiologic data in public repositories. However, data processing speed has become problematic due to the massive size of modern datasets. Here, we present FastMLST, a tool that is designed to perform PubMLST searches using BLASTn and a divide-and-conquer approach that processes each genome assembly in parallel. The output offered by FastMLST includes a table with the ST, allelic profile, and clonal complex or clade (when available), detected for a query, as well as a multi-FASTA file or a series of FASTA files with the concatenated or single allele sequences detected, respectively. FastMLST was validated with 91 different species, with a wide range of guanine-cytosine content (%GC), genome sizes, and fragmentation levels, and a speed test was performed on 3 datasets with varying genome sizes. Compared with other tools such as mlst, CGE/MLST, MLSTar, and PubMLST, FastMLST takes advantage of multiple processors to simultaneously type up to 28 000 genomes in less than 10 minutes, reducing processing times by at least 3-fold with 100% concordance to PubMLST, if contaminated genomes are excluded from the analysis. The source code, installation instructions, and documentation of FastMLST are available at https://github.com/EnzoAndree/FastMLST.

摘要

多位点序列分型（MLST）是一种用于流行病学和进化研究的精确的种内微生物分型方法。它通过根据定义方案中多个管家基因的等位基因组合为每个样本分配一个序列类型（ST）标识符来进行操作。由于公共数据库中大量基因组序列和流行病学数据的可用性，MLST的使用量成倍增加。然而，由于现代数据集规模巨大，数据处理速度已成为问题。在此，我们展示了FastMLST，这是一种旨在使用BLASTn和分治方法并行处理每个基因组组装来执行PubMLST搜索的工具。FastMLST提供的输出包括一个表格，其中列出了针对查询检测到的ST、等位基因谱以及克隆复合体或进化枝（如可用），以及一个多FASTA文件或一系列FASTA文件，分别包含检测到的串联或单个等位基因序列。FastMLST在91种不同物种上进行了验证，这些物种具有广泛的鸟嘌呤 - 胞嘧啶含量（%GC）、基因组大小和片段化水平，并在3个具有不同基因组大小的数据集上进行了速度测试。与其他工具如mlst、CGE/MLST、MLSTar和PubMLST相比，如果在分析中排除受污染的基因组，FastMLST利用多个处理器在不到10分钟的时间内同时对多达28000个基因组进行分型，将处理时间减少至少3倍，与PubMLST的一致性达到100%。FastMLST的源代码、安装说明和文档可在https://github.com/EnzoAndree/FastMLST获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6438/8637782/2118d3f9f78c/10.1177_11779322211059238-fig1.jpg

相似文献

FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies.FastMLST：一种用于草图基因组组装多位点序列分型的多核工具。

Bioinform Biol Insights. 2021 Nov 27;15:11779322211059238. doi: 10.1177/11779322211059238. eCollection 2021.

MLSTar: automatic multilocus sequence typing of bacterial genomes in R.MLSTar：用于在R语言中对细菌基因组进行多位点序列分型的自动化工具

PeerJ. 2018 Jun 15;6:e5098. doi: 10.7717/peerj.5098. eCollection 2018.

Defining a Core Genome Multilocus Sequence Typing Scheme for the Global Epidemiology of Vibrio parahaemolyticus.为副溶血性弧菌全球流行病学定义一种核心基因组多位点序列分型方案。

J Clin Microbiol. 2017 Jun;55(6):1682-1697. doi: 10.1128/JCM.00227-17. Epub 2017 Mar 22.

An Excel Macro for Determining Allelic and Sequence Types of Bacterial Clones in Multilocus Sequence Typing.一个用于确定多位点序列分型中细菌克隆等位基因和序列类型的 Excel 宏。

Ann Lab Med. 2019 Mar;39(2):183-189. doi: 10.3343/alm.2019.39.2.183.

STRAIN: an R package for multi-locus sequence typing from whole genome sequencing data.STRAIN：一个用于从全基因组测序数据进行多位点序列分型的 R 包。

BMC Bioinformatics. 2019 Nov 22;20(Suppl 9):347. doi: 10.1186/s12859-019-2887-1.

A multilocus sequence typing scheme for complex (MAB-multilocus sequence typing) using whole-genome sequencing data.一种使用全基因组测序数据的复杂分枝杆菌多位点序列分型方案（MAB - 多位点序列分型）

Int J Mycobacteriol. 2019 Jul-Sep;8(3):273-280. doi: 10.4103/ijmy.ijmy_106_19.

Short read sequence typing (SRST): multi-locus sequence types from short reads.短读序列分型（SRST）：来自短读的多位点序列型。

BMC Genomics. 2012 Jul 24;13:338. doi: 10.1186/1471-2164-13-338.

Core Genome Multilocus Sequence Typing for Identification of Globally Distributed Clonal Groups and Differentiation of Outbreak Strains of Listeria monocytogenes.用于鉴定全球分布的克隆群及区分单核细胞增生李斯特菌暴发菌株的核心基因组多位点序列分型

Appl Environ Microbiol. 2016 Sep 30;82(20):6258-6272. doi: 10.1128/AEM.01532-16. Print 2016 Oct 15.

Comparative genotyping of Streptococcus mutans by repetitive extragenic palindromic polymerase chain reaction and multilocus sequence typing.应用多位点重复序列 PCR 和多位点序列分型对变形链球菌进行基因分型比较。

Mol Oral Microbiol. 2013 Feb;28(1):18-27. doi: 10.1111/omi.12002. Epub 2012 Oct 12.

MentaLiST - A fast MLST caller for large MLST schemes.MentaLiST - 一种适用于大型 MLST 方案的快速 MLST 调用程序。

Microb Genom. 2018 Feb;4(2). doi: 10.1099/mgen.0.000146. Epub 2018 Jan 10.

引用本文的文献

Global trends of antimicrobial resistance and virulence of Klebsiella pneumoniae from different host sources.不同宿主来源肺炎克雷伯菌的抗菌药物耐药性及毒力的全球趋势

Commun Med (Lond). 2025 Sep 1;5(1):383. doi: 10.1038/s43856-025-01112-1.

Genomic analysis of contaminant Stenotrophomonas maltophilia, from placental swab culture, carrying antibiotic resistance: a potential hospital laboratory contaminant.来自胎盘拭子培养物的携带抗生素耐药性的嗜麦芽窄食单胞菌污染物的基因组分析：一种潜在的医院实验室污染物

Sci Rep. 2025 Jul 1;15(1):22323. doi: 10.1038/s41598-025-07253-6.

Genome assembly of based on metagenomic next-generation sequencing reveals its genomic characteristics in population genetics and molecular epidemiology.基于宏基因组二代测序的基因组组装揭示了其在群体遗传学和分子流行病学中的基因组特征。

Front Microbiol. 2025 Apr 24;16:1546594. doi: 10.3389/fmicb.2025.1546594. eCollection 2025.

The research on the identification, taxonomy, and comparative genomics analysis of nine strains significantly contributes to microbiology, genetics, bioinformatics, and biotechnology.对九种菌株的鉴定、分类学及比较基因组学分析的研究对微生物学、遗传学、生物信息学和生物技术有显著贡献。

Front Microbiol. 2025 Mar 19;16:1544934. doi: 10.3389/fmicb.2025.1544934. eCollection 2025.

β-Lactamase diversity in .β-内酰胺酶在……中的多样性

Antimicrob Agents Chemother. 2025 Mar 5;69(3):e0078424. doi: 10.1128/aac.00784-24. Epub 2025 Feb 10.

β-Lactamase diversity in .β-内酰胺酶在……中的多样性

Antimicrob Agents Chemother. 2025 Mar 5;69(3):e0078524. doi: 10.1128/aac.00785-24. Epub 2025 Feb 10.

The Prevalence and the Underlying Mechanisms of Fosfomycin Resistance of and spp. Among Cattle in Japan.日本牛群中大肠埃希菌和肺炎克雷伯菌对磷霉素耐药性的流行情况及潜在机制

Int J Mol Sci. 2024 Dec 23;25(24):13723. doi: 10.3390/ijms252413723.

Melioidosis in goats at a single Australian farm was caused by multiple diverse lineages of Burkholderia pseudomallei present in soil.澳大利亚一个农场山羊的类鼻疽病是由土壤中存在的多种不同谱系的伯克霍尔德菌引起的。

PLoS Negl Trop Dis. 2024 Dec 19;18(12):e0012683. doi: 10.1371/journal.pntd.0012683. eCollection 2024 Dec.

Subcutaneous inoculation of Escherichia coli in broiler chickens causes cellulitis and elicits innate and specific immune responses.在肉鸡皮下接种大肠杆菌会引发蜂窝织炎，并引发先天性和特异性免疫反应。

BMC Vet Res. 2024 Dec 2;20(1):545. doi: 10.1186/s12917-024-04392-2.

Characterization of acquired β-lactamases in and quantification of their contributions to resistance.研究获得性β-内酰胺酶的特性及其对耐药性的贡献。

Microbiol Spectr. 2024 Oct 3;12(10):e0069424. doi: 10.1128/spectrum.00694-24. Epub 2024 Sep 9.

本文引用的文献

Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications.开放获取的细菌群体基因组学：BIGSdb软件、PubMLST.org网站及其应用。

Wellcome Open Res. 2018 Sep 24;3:124. doi: 10.12688/wellcomeopenres.14826.1. eCollection 2018.

Multilocus sequence typing of Shewanella algae isolates identifies disease-causing Shewanella chilikensis strain 6I4.对希瓦氏菌属藻类分离株的多位点序列分型确定了致病希瓦氏菌属 6I4 菌株。

FEMS Microbiol Ecol. 2019 Jan 1;95(1). doi: 10.1093/femsec/fiy210.

Will the emergence of core genome MLST end the role of in silico MLST?核心基因组 MLST 的出现是否会终结基于计算机的 MLST 的作用？

Food Microbiol. 2018 Oct;75:28-36. doi: 10.1016/j.fm.2017.09.003. Epub 2017 Sep 8.

MLSTar: automatic multilocus sequence typing of bacterial genomes in R.MLSTar：用于在R语言中对细菌基因组进行多位点序列分型的自动化工具

PeerJ. 2018 Jun 15;6:e5098. doi: 10.7717/peerj.5098. eCollection 2018.

Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments.分治（DC）BLAST：在高性能计算（HPC）环境中快速轻松地执行BLAST。

PeerJ. 2017 Jun 22;5:e3486. doi: 10.7717/peerj.3486. eCollection 2017.

Multilocus sequence analysis (MLSA) in prokaryotic taxonomy.原核生物分类学中的多位点序列分析（MLSA）。

Syst Appl Microbiol. 2015 Jun;38(4):237-45. doi: 10.1016/j.syapm.2015.03.007. Epub 2015 Apr 11.

Multilocus sequence typing of total-genome-sequenced bacteria.全基因组测序细菌的多位点序列分型。

J Clin Microbiol. 2012 Apr;50(4):1355-61. doi: 10.1128/JCM.06094-11. Epub 2012 Jan 11.

Multilocus sequence typing of Clostridium difficile.艰难梭菌多位点序列分型。

J Clin Microbiol. 2010 Mar;48(3):770-8. doi: 10.1128/JCM.01796-09. Epub 2009 Dec 30.

BLAST+: architecture and applications.BLAST+：体系结构与应用。

BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421.

Biopython: freely available Python tools for computational molecular biology and bioinformatics.Biopython：用于计算分子生物学和生物信息学的免费可用Python工具。

Bioinformatics. 2009 Jun 1;25(11):1422-3. doi: 10.1093/bioinformatics/btp163. Epub 2009 Mar 20.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

FastMLST：一种用于草图基因组组装多位点序列分型的多核工具。

FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献