少量的错误组装可能会对基于泛基因组的宏基因组分析产生不成比例的影响。

Small amounts of misassembly can have disproportionate effects on pangenome-based metagenomic analyses.

作者信息

Majernik Stephanie N, Beaver Larry, Bradley Patrick H

机构信息

Department of Microbiology, The Ohio State University, Columbus, Ohio, USA.

Infectious Diseases Institute, The Ohio State University, Columbus, Ohio, USA.

出版信息

mSphere. 2025 May 27;10(5):e0085724. doi: 10.1128/msphere.00857-24. Epub 2025 Apr 29.

Abstract

Individual genes from microbiomes can drive host-level phenotypes. To help identify such candidate genes, several recent tools estimate microbial gene copy numbers directly from metagenomes. These tools rely on alignments to pangenomes, which, in turn, are derived from the set of all individual genomes from one species. While large-scale metagenomic assembly efforts have made pangenome estimates more complete, mixed communities can also introduce contamination into assemblies, and it is unknown how robust pangenome-based metagenomic analyses are to these errors. To gain insight into this problem, we re-analyzed a case-control study of the gut microbiome in cirrhosis, focusing on commensal Clostridia previously implicated in this disease. We tested for differentially prevalent genes in the and then investigated which were likely to be contaminants using sequence similarity searches. Out of 86 differentially prevalent genes, we found that 33 (38%) were probably contaminants originating in taxa such as and , unrelated genera that were independently correlated with disease status. Our results demonstrate that even small amounts of contamination in metagenome assemblies, below typical quality thresholds, can threaten to overwhelm gene-level metagenomic analyses. However, we also show that such contaminants can be accurately identified using a method based on gene-to-species correlation. After removing these contaminants, we observe that several flagellar motility gene clusters in the pangenome are associated with cirrhosis status. We have integrated our analyses into an analysis and visualization pipeline, PanSweep, that can automatically identify cases where pangenome contamination may bias the results of gene-resolved analyses.IMPORTANCEMetagenome-assembled genomes, or MAGs, can be constructed without pure cultures of microbes. Large-scale efforts to build MAGs have yielded more complete pangenomes (i.e., sets of all genes found in one species). Pangenomes allow us to measure strain variation in gene content, which can strongly affect phenotype. However, because MAGs come from mixed communities, they can contaminate pangenomes with unrelated DNA; how much this impacts downstream analyses has not been studied. Using a metagenomic study of gut microbes in cirrhosis as our test case, we investigate how contamination affects analyses of microbial gene content. Surprisingly, even small, typical amounts of MAG contamination (<5%) result in disproportionately high levels of false positive associations (38%). Fortunately, we show that most contaminants can be automatically flagged and provide a simple method for doing so. Furthermore, applying this method reveals a new association between cirrhosis and gut microbial motility.

摘要

微生物组中的单个基因可驱动宿主水平的表型。为了帮助识别此类候选基因,最近有几种工具可直接从宏基因组中估计微生物基因拷贝数。这些工具依赖于与泛基因组的比对,而泛基因组又源自一个物种的所有个体基因组集合。虽然大规模的宏基因组组装工作使泛基因组估计更加完整,但混合群落也可能给组装带来污染,基于泛基因组的宏基因组分析对这些错误的稳健性尚不清楚。为了深入了解这个问题,我们重新分析了一项关于肝硬化患者肠道微生物组的病例对照研究,重点关注先前与该疾病有关的共生梭菌。我们在病例组和对照组中检测了差异普遍存在的基因,然后使用序列相似性搜索来研究哪些可能是污染物。在86个差异普遍存在的基因中,我们发现33个(38%)可能是源自诸如[未提及的属名1]和[未提及的属名2]等分类群的污染物,这些无关的属与疾病状态独立相关。我们的结果表明,即使宏基因组组装中的污染量很小,低于典型质量阈值,也可能威胁到基因水平的宏基因组分析。然而,我们也表明,可以使用基于基因与物种相关性的方法准确识别此类污染物。去除这些污染物后,我们观察到[未提及的属名]泛基因组中的几个鞭毛运动基因簇与肝硬化状态相关。我们已将我们的分析整合到一个分析和可视化流程PanSweep中,该流程可以自动识别泛基因组污染可能使基因解析分析结果产生偏差的情况。

重要性

宏基因组组装基因组(MAGs)可以在没有微生物纯培养物的情况下构建。构建MAGs的大规模努力已经产生了更完整的泛基因组(即一个物种中发现的所有基因的集合)。泛基因组使我们能够测量基因含量中的菌株变异,这可能会强烈影响表型。然而,由于MAGs来自混合群落,它们可能会用无关的DNA污染泛基因组;这对下游分析有多大影响尚未得到研究。以一项关于肝硬化患者肠道微生物的宏基因组研究作为我们的测试案例,我们研究了污染如何影响微生物基因含量的分析。令人惊讶的是,即使是少量的、典型的MAG污染(<5%)也会导致不成比例的高假阳性关联水平(38%)。幸运的是,我们表明大多数污染物可以自动标记,并提供了一种简单的标记方法。此外,应用这种方法揭示了肝硬化与肠道微生物运动之间的新关联。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/919d/12108083/c186e91c7129/msphere.00857-24.f001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索