Data Science, Chan Zuckerberg Biohub, San Francisco, California.
Data Science and Biotechnology, Gladstone Institutes, San Francisco, California.
Curr Protoc. 2022 Dec;2(12):e604. doi: 10.1002/cpz1.604.
The Metagenomic Intra-Species Diversity Analysis System 2 (MIDAS2) is a scalable pipeline that identifies single nucleotide variants and gene copy number variants in metagenomes using comprehensive reference databases built from public microbial genome collections (metagenotyping). MIDAS2 is the first metagenotyping tool with functionality to control metagenomic read mapping filters and to customize the reference database to the microbial community, features that improve the precision and recall of detected variants. In this article we present four basic protocols for the most common use cases of MIDAS2, along with supporting protocols for installation and use. In addition, we provide in-depth guidance on adjusting command line parameters, editing the reference database, optimizing hardware utilization, and understanding the metagenotyping results. All the steps of metagenotyping, from raw sequencing reads to population genetic analysis, are demonstrated with example data in two downloadable sequencing libraries of single-end metagenomic reads representing a mixture of multiple bacterial species. This set of protocols empowers users to accurately genotype hundreds of species in thousands of samples, providing rich genetic data for studying the evolution and strain-level ecology of microbial communities. © 2022 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Species prescreening Basic Protocol 2: Download MIDAS reference database Basic Protocol 3: Population single nucleotide variant calling Basic Protocol 4: Pan-genome copy number variant calling Support Protocol 1: Installing MIDAS2 Support Protocol 2: Command line inputs Support Protocol 3: Metagenotyping with a custom collection of genomes Support Protocol 4: Metagenotyping with advanced parameters.
宏基因组种内多样性分析系统 2(MIDAS2)是一个可扩展的流水线,它使用从公共微生物基因组集合构建的综合参考数据库来识别宏基因组中的单核苷酸变体和基因拷贝数变体(宏基因分型)。MIDAS2 是第一个具有控制宏基因组读映射过滤器和自定义参考数据库以适应微生物群落的功能的宏基因分型工具,这些功能提高了检测变体的精度和召回率。在本文中,我们介绍了 MIDAS2 最常见用例的四个基本方案,以及安装和使用的支持方案。此外,我们还提供了有关调整命令行参数、编辑参考数据库、优化硬件利用以及理解宏基因分型结果的深入指导。从原始测序读取到群体遗传分析的宏基因分型的所有步骤都使用两个可下载的单端宏基因组读取测序文库中的示例数据进行了演示,这些文库代表了多种细菌物种的混合物。这组方案使研究人员能够在数千个样本中准确地对数百个物种进行基因分型,为研究微生物群落的进化和菌株生态提供了丰富的遗传数据。© 2022 作者。Wiley Periodicals LLC 出版的《当代协议》。基本方案 1:物种预筛选基本方案 2:下载 MIDAS 参考数据库基本方案 3:群体单核苷酸变体调用基本方案 4:泛基因组拷贝数变体调用支持方案 1:安装 MIDAS2支持方案 2:命令行输入支持方案 3:使用自定义基因组集合进行宏基因分型支持方案 4:使用高级参数进行宏基因分型。