Suppr超能文献

mStrain:利用宏基因组数据进行菌株水平的鉴定。

mStrain: strain-level identification of using metagenomic data.

作者信息

Qian Xiuwei, Wu Yarong, Zuo Xiujuan, Peng Xin, Guo Yan, Yang Ruifu, Zhang Xianglilan, Cui Yujun

机构信息

School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China.

State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China.

出版信息

Bioinform Adv. 2023 Sep 15;3(1):vbad115. doi: 10.1093/bioadv/vbad115. eCollection 2023.

Abstract

MOTIVATION

High-resolution target pathogen detection using metagenomic sequencing data represents a major challenge due to the low concentration of target pathogens in samples. We introduced mStrain, a novel strain/lineage-level identification tool that utilizes metagenomic data. mStrain successfully identified at the strain/lineage level by extracting sufficient information regarding single-nucleotide polymorphisms (SNPs), which can therefore be an effective tool for identification and source tracking of based on metagenomic data during a plague outbreak.

DEFINITION

.

STRAIN-LEVEL IDENTIFICATION: Assigning the reads in the metagenomic sequencing data to an exactly known or most closely representative strain.

LINEAGE-LEVEL IDENTIFICATION: Assigning the reads in the metagenomic sequencing data to a specific lineage on the phylogenetic tree.

CANOSNPS

The unique and typical SNPs present in all representative strains.

ANCESTOR/DERIVED STATE: An SNP is defined as the ancestor state when consistent with the allele of strain IP32953; otherwise, the SNP is defined as the derived state.

AVAILABILITY AND IMPLEMENTATION

The code for running mStrain, the test dataset, and instructions for running the code can be found at the following GitHub repository: https://github.com/xwqian1123/mStrain.

摘要

动机

由于样本中目标病原体浓度较低,利用宏基因组测序数据进行高分辨率目标病原体检测是一项重大挑战。我们引入了mStrain,这是一种利用宏基因组数据的新型菌株/谱系水平识别工具。mStrain通过提取有关单核苷酸多态性(SNP)的足够信息,成功地在菌株/谱系水平上进行了识别,因此在鼠疫爆发期间,它可以成为基于宏基因组数据进行识别和溯源的有效工具。

定义

.

菌株水平识别

将宏基因组测序数据中的 reads 分配到一个确切已知或最具代表性的菌株。

谱系水平识别

将宏基因组测序数据中的 reads 分配到系统发育树上的特定谱系。

CANOSNPS

所有代表性菌株中存在的独特且典型的SNP。

祖先/衍生状态:当SNP与菌株IP32953的等位基因一致时,该SNP被定义为祖先状态;否则,该SNP被定义为衍生状态。

可用性与实现

运行mStrain的代码、测试数据集以及运行代码的说明可在以下GitHub仓库中找到:https://github.com/xwqian1123/mStrain。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/56de/10516513/6093bf034434/vbad115f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验