Smith Byron J, Zhao Chunyu, Dubinkina Veronika, Jin Xiaofan, Zahavi Liron, Shoer Saar, Moltzau-Anderson Jacqueline, Segal Eran, Pollard Katherine S
The Gladstone Institute of Data Science and Biotechnology, San Francisco, California 94158, USA.
Chan Zuckerberg Biohub San Francisco, San Francisco, California 94158, USA.
Genome Res. 2025 May 2;35(5):1247-1260. doi: 10.1101/gr.279543.124.
Metagenomics has greatly expanded our understanding of the human gut microbiome by revealing a vast diversity of bacterial species within and across individuals. Even within a single species, different strains can have highly divergent gene content, affecting traits such as antibiotic resistance, metabolism, and virulence. Methods that harness metagenomic data to resolve strain-level differences in functional potential are crucial for understanding the causes and consequences of this intraspecific diversity. The enormous size of pangenome references, strain mixing within samples, and inconsistent sequencing depth present challenges for existing tools that analyze samples one at a time. To address this gap, we updated the MIDAS pangenome profiler, now released as version 3, and developed StrainPGC, an approach to strain-specific gene content estimation that combines strain tracking and correlations across multiple samples. We validate our integrated analysis using a complex synthetic community of strains from the human gut and find that StrainPGC outperforms existing approaches. Analyzing a large, publicly available metagenome collection from inflammatory bowel disease patients and healthy controls, we catalog the functional repertoires of thousands of strains across hundreds of species, capturing extensive diversity missing from reference databases. Finally, we apply StrainPGC to metagenomes from a clinical trial of fecal microbiota transplantation for the treatment of ulcerative colitis. We identify two strains, from two different donors, that are both frequently transmitted to patients but have notable differences in functional potential. StrainPGC and MIDAS v3 together enable precise, intraspecific pangenomic investigations using large collections of metagenomic data without microbial isolation or de novo assembly.
宏基因组学通过揭示个体内部和个体之间种类繁多的细菌,极大地扩展了我们对人类肠道微生物群的理解。即使在单个物种内,不同菌株的基因含量也可能有很大差异,从而影响抗生素抗性、新陈代谢和毒力等特征。利用宏基因组数据来解析功能潜力方面菌株水平差异的方法,对于理解这种种内多样性的原因和后果至关重要。泛基因组参考的巨大规模、样本中的菌株混合以及测序深度不一致,给现有的一次分析一个样本的工具带来了挑战。为了弥补这一差距,我们更新了现在作为第3版发布的MIDAS泛基因组分析器,并开发了StrainPGC,这是一种结合菌株追踪和多个样本间相关性来估计菌株特异性基因含量的方法。我们使用来自人类肠道的复杂合成菌株群落验证了我们的综合分析,发现StrainPGC优于现有方法。通过分析来自炎症性肠病患者和健康对照的大量公开可用宏基因组数据集,我们编目了数百个物种中数千个菌株的功能库,捕捉到了参考数据库中缺失的广泛多样性。最后,我们将StrainPGC应用于粪便微生物群移植治疗溃疡性结肠炎的临床试验的宏基因组。我们鉴定出来自两个不同供体的两种菌株,它们都经常传播给患者,但在功能潜力上有显著差异。StrainPGC和MIDAS v3共同实现了使用大量宏基因组数据进行精确的种内泛基因组研究,而无需微生物分离或从头组装。
mSphere. 2021-2-24
Microbiol Spectr. 2024-11-5
BMC Bioinformatics. 2016-1-16
Nat Biotechnol. 2024-8
Front Cell Infect Microbiol. 2023
Bioinformatics. 2023-1-1
Front Bioinform. 2022-5-16