Suppr超能文献

FastMLST:一种用于草图基因组组装多位点序列分型的多核工具。

FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies.

作者信息

Guerrero-Araya Enzo, Muñoz Marina, Rodríguez César, Paredes-Sabja Daniel

机构信息

Microbiota-Host Interactions and Clostridia Research Group, Facultad de Ciencias de la Vida, Universidad Andrés Bello, Santiago, Chile.

ANID, Millennium Science Initiative Program, Millennium Nucleus in the Biology of the Intestinal Microbiota, Santiago, Chile.

出版信息

Bioinform Biol Insights. 2021 Nov 27;15:11779322211059238. doi: 10.1177/11779322211059238. eCollection 2021.

Abstract

Multilocus Sequence Typing (MLST) is a precise microbial typing approach at the intra-species level for epidemiologic and evolutionary purposes. It operates by assigning a sequence type (ST) identifier to each specimen, based on a combination of alleles of multiple housekeeping genes included in a defined scheme. The use of MLST has multiplied due to the availability of large numbers of genomic sequences and epidemiologic data in public repositories. However, data processing speed has become problematic due to the massive size of modern datasets. Here, we present FastMLST, a tool that is designed to perform PubMLST searches using BLASTn and a divide-and-conquer approach that processes each genome assembly in parallel. The output offered by FastMLST includes a table with the ST, allelic profile, and clonal complex or clade (when available), detected for a query, as well as a multi-FASTA file or a series of FASTA files with the concatenated or single allele sequences detected, respectively. FastMLST was validated with 91 different species, with a wide range of guanine-cytosine content (%GC), genome sizes, and fragmentation levels, and a speed test was performed on 3 datasets with varying genome sizes. Compared with other tools such as mlst, CGE/MLST, MLSTar, and PubMLST, FastMLST takes advantage of multiple processors to simultaneously type up to 28 000 genomes in less than 10 minutes, reducing processing times by at least 3-fold with 100% concordance to PubMLST, if contaminated genomes are excluded from the analysis. The source code, installation instructions, and documentation of FastMLST are available at https://github.com/EnzoAndree/FastMLST.

摘要

多位点序列分型(MLST)是一种用于流行病学和进化研究的精确的种内微生物分型方法。它通过根据定义方案中多个管家基因的等位基因组合为每个样本分配一个序列类型(ST)标识符来进行操作。由于公共数据库中大量基因组序列和流行病学数据的可用性,MLST的使用量成倍增加。然而,由于现代数据集规模巨大,数据处理速度已成为问题。在此,我们展示了FastMLST,这是一种旨在使用BLASTn和分治方法并行处理每个基因组组装来执行PubMLST搜索的工具。FastMLST提供的输出包括一个表格,其中列出了针对查询检测到的ST、等位基因谱以及克隆复合体或进化枝(如可用),以及一个多FASTA文件或一系列FASTA文件,分别包含检测到的串联或单个等位基因序列。FastMLST在91种不同物种上进行了验证,这些物种具有广泛的鸟嘌呤 - 胞嘧啶含量(%GC)、基因组大小和片段化水平,并在3个具有不同基因组大小的数据集上进行了速度测试。与其他工具如mlst、CGE/MLST、MLSTar和PubMLST相比,如果在分析中排除受污染的基因组,FastMLST利用多个处理器在不到10分钟的时间内同时对多达28000个基因组进行分型,将处理时间减少至少3倍,与PubMLST的一致性达到100%。FastMLST的源代码、安装说明和文档可在https://github.com/EnzoAndree/FastMLST获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6438/8637782/2118d3f9f78c/10.1177_11779322211059238-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验