Qian Xiuwei, Wu Yarong, Zuo Xiujuan, Peng Xin, Guo Yan, Yang Ruifu, Zhang Xianglilan, Cui Yujun
School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China.
State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China.
Bioinform Adv. 2023 Sep 15;3(1):vbad115. doi: 10.1093/bioadv/vbad115. eCollection 2023.
High-resolution target pathogen detection using metagenomic sequencing data represents a major challenge due to the low concentration of target pathogens in samples. We introduced mStrain, a novel strain/lineage-level identification tool that utilizes metagenomic data. mStrain successfully identified at the strain/lineage level by extracting sufficient information regarding single-nucleotide polymorphisms (SNPs), which can therefore be an effective tool for identification and source tracking of based on metagenomic data during a plague outbreak.
.
STRAIN-LEVEL IDENTIFICATION: Assigning the reads in the metagenomic sequencing data to an exactly known or most closely representative strain.
LINEAGE-LEVEL IDENTIFICATION: Assigning the reads in the metagenomic sequencing data to a specific lineage on the phylogenetic tree.
The unique and typical SNPs present in all representative strains.
ANCESTOR/DERIVED STATE: An SNP is defined as the ancestor state when consistent with the allele of strain IP32953; otherwise, the SNP is defined as the derived state.
The code for running mStrain, the test dataset, and instructions for running the code can be found at the following GitHub repository: https://github.com/xwqian1123/mStrain.
由于样本中目标病原体浓度较低,利用宏基因组测序数据进行高分辨率目标病原体检测是一项重大挑战。我们引入了mStrain,这是一种利用宏基因组数据的新型菌株/谱系水平识别工具。mStrain通过提取有关单核苷酸多态性(SNP)的足够信息,成功地在菌株/谱系水平上进行了识别,因此在鼠疫爆发期间,它可以成为基于宏基因组数据进行识别和溯源的有效工具。
.
将宏基因组测序数据中的 reads 分配到一个确切已知或最具代表性的菌株。
将宏基因组测序数据中的 reads 分配到系统发育树上的特定谱系。
所有代表性菌株中存在的独特且典型的SNP。
祖先/衍生状态:当SNP与菌株IP32953的等位基因一致时,该SNP被定义为祖先状态;否则,该SNP被定义为衍生状态。
运行mStrain的代码、测试数据集以及运行代码的说明可在以下GitHub仓库中找到:https://github.com/xwqian1123/mStrain。