• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

少量的错误组装可能会对基于泛基因组的宏基因组分析产生不成比例的影响。

Small amounts of misassembly can have disproportionate effects on pangenome-based metagenomic analyses.

作者信息

Majernik Stephanie N, Beaver Larry, Bradley Patrick H

机构信息

Dept. of Microbiology, The Ohio State University, Columbus, OH 43210 USA.

Infectious Diseases Institute, The Ohio State University, Columbus, OH 43210 USA.

出版信息

bioRxiv. 2024 Oct 13:2024.10.11.617902. doi: 10.1101/2024.10.11.617902.

DOI:10.1101/2024.10.11.617902
PMID:39416140
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11482961/
Abstract

UNLABELLED

Individual genes from microbiomes can drive host-level phenotypes. To help identify such candidate genes, several recent tools estimate microbial gene copy numbers directly from metagenomes. These tools rely on alignments to pangenomes, which in turn are derived from the set of all individual genomes from one species. While large-scale metagenomic assembly efforts have made pangenome estimates more complete, mixed communities can also introduce contamination into assemblies, and it is unknown how robust pangenome-based metagenomic analyses are to these errors. To gain insight into this problem, we re-analyzed a case-control study of the gut microbiome in cirrhosis, focusing on commensal Clostridia previously implicated in this disease. We tested for differentially prevalent genes in the , then investigated which were likely to be contaminants using sequence similarity searches. Out of 86 differentially prevalent genes, we found that 33 (38%) were probably contaminants originating in taxa such as and , unrelated genera that were independently correlated with disease status. Our results demonstrate that even small amounts of contamination in metagenome assemblies, below typical quality thresholds, can threaten to overwhelm gene-level metagenomic analyses. However, we also show that such contaminants can be accurately identified using a method based on gene-to-species correlation. After removing these contaminants, we observe that several flagellar motility gene clusters in the pangenome are associated with cirrhosis status. We have integrated our analyses into an analysis and visualization pipeline, PanSweep, that can automatically identify cases where pangenome contamination may bias the results of gene-resolved analyses.

IMPORTANCE

Metagenome-assembled genomes, or MAGs, can be constructed without pure cultures of microbes. Large scale efforts to build MAGs have yielded more complete pangenomes (i.e., sets of all genes found in one species). Pangenomes allow us to measure strain variation in gene content, which can strongly affect phenotype. However, because MAGs come from mixed communities, they can contaminate pangenomes with unrelated DNA, and how much this impacts downstream analyses has not been studied. Using a metagenomic study of gut microbes in cirrhosis as our test case, we investigate how contamination affects analyses of microbial gene content. Surprisingly, even small, typical amounts of MAG contamination (<5%) result in disproportionately high levels of false positive associations (38%). Fortunately, we show that most contaminants can be automatically flagged, and provide a simple method for doing so. Furthermore, applying this method reveals a new association between cirrhosis and gut microbial motility.

摘要

未标记

微生物群落中的单个基因可驱动宿主水平的表型。为了帮助识别此类候选基因,最近有几种工具可直接从宏基因组中估计微生物基因拷贝数。这些工具依赖于与泛基因组的比对,而泛基因组又源自一个物种的所有个体基因组集合。虽然大规模宏基因组组装工作使泛基因组估计更加完整,但混合群落也可能将污染引入组装中,基于泛基因组的宏基因组分析对这些错误的稳健性尚不清楚。为了深入了解这个问题,我们重新分析了一项肝硬化患者肠道微生物群的病例对照研究,重点关注先前与该疾病有关的共生梭菌。我们在病例组和对照组中检测了差异普遍存在的基因,然后使用序列相似性搜索调查哪些可能是污染物。在86个差异普遍存在的基因中,我们发现33个(38%)可能是源自诸如[具体属1]和[具体属2]等分类群的污染物,这些不相关的属与疾病状态独立相关。我们的结果表明,即使宏基因组组装中的污染量很小,低于典型质量阈值,也可能威胁到基因水平的宏基因组分析。然而,我们也表明,可以使用基于基因与物种相关性的方法准确识别此类污染物。去除这些污染物后,我们观察到[具体物种]泛基因组中的几个鞭毛运动基因簇与肝硬化状态相关。我们已将我们的分析整合到一个分析和可视化流程PanSweep中,该流程可以自动识别泛基因组污染可能使基因解析分析结果产生偏差的情况。

重要性

无需微生物的纯培养即可构建宏基因组组装基因组(MAGs)。构建MAGs的大规模努力已产生更完整的泛基因组(即一个物种中发现的所有基因的集合)。泛基因组使我们能够测量基因含量中的菌株变异情况,这可能对表型产生强烈影响。然而,由于MAGs来自混合群落,它们可能会用不相关的DNA污染泛基因组,而这对下游分析的影响程度尚未得到研究。以肝硬化患者肠道微生物的宏基因组研究作为我们的测试案例,我们研究了污染如何影响微生物基因含量的分析。令人惊讶的是,即使是少量的典型MAG污染(<5%)也会导致不成比例的高假阳性关联水平(38%)。幸运的是,我们表明大多数污染物可以自动标记,并提供了一种简单的标记方法。此外,应用这种方法揭示了肝硬化与肠道微生物运动之间的新关联。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ab3/11482961/fe640a17f407/nihpp-2024.10.11.617902v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ab3/11482961/d0c0926e3545/nihpp-2024.10.11.617902v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ab3/11482961/fe640a17f407/nihpp-2024.10.11.617902v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ab3/11482961/d0c0926e3545/nihpp-2024.10.11.617902v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6ab3/11482961/fe640a17f407/nihpp-2024.10.11.617902v1-f0002.jpg

相似文献

1
Small amounts of misassembly can have disproportionate effects on pangenome-based metagenomic analyses.少量的错误组装可能会对基于泛基因组的宏基因组分析产生不成比例的影响。
bioRxiv. 2024 Oct 13:2024.10.11.617902. doi: 10.1101/2024.10.11.617902.
2
Small amounts of misassembly can have disproportionate effects on pangenome-based metagenomic analyses.少量的错误组装可能会对基于泛基因组的宏基因组分析产生不成比例的影响。
mSphere. 2025 May 27;10(5):e0085724. doi: 10.1128/msphere.00857-24. Epub 2025 Apr 29.
3
Comprehensive Functional Annotation of Metagenomes and Microbial Genomes Using a Deep Learning-Based Method.基于深度学习的宏基因组和微生物组综合功能注释。
mSystems. 2023 Apr 27;8(2):e0117822. doi: 10.1128/msystems.01178-22. Epub 2023 Mar 7.
4
Large-Scale Metagenome Assembly Reveals Novel Animal-Associated Microbial Genomes, Biosynthetic Gene Clusters, and Other Genetic Diversity.大规模宏基因组组装揭示了与动物相关的新型微生物基因组、生物合成基因簇及其他遗传多样性。
mSystems. 2020 Nov 3;5(6):e01045-20. doi: 10.1128/mSystems.01045-20.
5
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
6
The Reliability of Metagenome-Assembled Genomes (MAGs) in Representing Natural Populations: Insights from Comparing MAGs against Isolate Genomes Derived from the Same Fecal Sample.宏基因组组装基因组(MAGs)在代表自然种群方面的可靠性:来自比较源自同一粪便样本的分离基因组的 MAGs 的见解。
Appl Environ Microbiol. 2021 Feb 26;87(6). doi: 10.1128/AEM.02593-20.
7
Mining metagenomic data to gain a new insight into the gut microbial biosynthetic potential in placental mammals.从宏基因组数据中挖掘新的见解,以了解胎盘哺乳动物肠道微生物的生物合成潜力。
Microbiol Spectr. 2024 Oct 3;12(10):e0086424. doi: 10.1128/spectrum.00864-24. Epub 2024 Aug 20.
8
Improved microbial genomes and gene catalog of the chicken gut from metagenomic sequencing of high-fidelity long reads.基于高通量长读长测序的宏基因组分析,提高了鸡肠道微生物基因组和基因序列的完整性。
Gigascience. 2022 Nov 18;11. doi: 10.1093/gigascience/giac116.
9
Evaluating Assembly and Binning Strategies for Time Series Drinking Water Metagenomes.评估时间序列饮用水宏基因组的组装和分类策略。
Microbiol Spectr. 2021 Dec 22;9(3):e0143421. doi: 10.1128/Spectrum.01434-21. Epub 2021 Nov 3.
10
MetaPGN: a pipeline for construction and graphical visualization of annotated pangenome networks.MetaPGN:用于构建和图形化可视化注释泛基因组网络的流水线。
Gigascience. 2018 Nov 1;7(11):giy121. doi: 10.1093/gigascience/giy121.

本文引用的文献

1
Accurate estimation of intraspecific microbial gene content variation in metagenomic data with MIDAS v3 and StrainPGC.使用MIDAS v3和StrainPGC准确估计宏基因组数据中种内微生物基因含量变异。
Genome Res. 2025 May 2;35(5):1247-1260. doi: 10.1101/gr.279543.124.
2
Complex heatmap visualization.复杂热图可视化。
Imeta. 2022 Aug 1;1(3):e43. doi: 10.1002/imt2.43. eCollection 2022 Sep.
3
StrainPanDA: Linked reconstruction of strain composition and gene content profiles via pangenome-based decomposition of metagenomic data.
StrainPanDA:通过基于泛基因组的宏基因组数据分解对菌株组成和基因含量谱进行关联重建
Imeta. 2022 Aug 1;1(3):e41. doi: 10.1002/imt2.41. eCollection 2022 Sep.
4
Oral bacteria relative abundance in faeces increases due to gut microbiota depletion and is linked with patient outcomes.粪便中口腔细菌的相对丰度因肠道微生物群耗竭而增加,并与患者的结果相关。
Nat Microbiol. 2024 Jun;9(6):1555-1565. doi: 10.1038/s41564-024-01680-3. Epub 2024 May 2.
5
Rapid and sensitive detection of genome contamination at scale with FCS-GX.使用 FCS-GX 实现大规模的基因组污染快速灵敏检测。
Genome Biol. 2024 Feb 26;25(1):60. doi: 10.1186/s13059-024-03198-7.
6
Metagenomic assembly is the main bottleneck in the identification of mobile genetic elements.宏基因组组装是鉴定移动遗传元件的主要瓶颈。
PeerJ. 2024 Jan 4;12:e16695. doi: 10.7717/peerj.16695. eCollection 2024.
7
Oral fecal transplantation enriches Lachnospiraceae and butyrate to mitigate acute liver injury.口服粪便移植可富集lachnospiraceae 并增加丁酸以减轻急性肝损伤。
Cell Rep. 2024 Jan 23;43(1):113591. doi: 10.1016/j.celrep.2023.113591. Epub 2023 Dec 27.
8
RefSeq and the prokaryotic genome annotation pipeline in the age of metagenomes.RefSeq 与宏基因组时代的原核生物基因组注释流程。
Nucleic Acids Res. 2024 Jan 5;52(D1):D762-D769. doi: 10.1093/nar/gkad988.
9
Origin and functional diversification of PAS domain, a ubiquitous intracellular sensor.PAS 结构域的起源和功能多样化,一种普遍存在的细胞内传感器。
Sci Adv. 2023 Sep;9(35):eadi4517. doi: 10.1126/sciadv.adi4517. Epub 2023 Aug 30.
10
MGnify Genomes: A Resource for Biome-specific Microbial Genome Catalogues.MGnify 基因组:用于生物群落特异性微生物基因组目录的资源。
J Mol Biol. 2023 Jul 15;435(14):168016. doi: 10.1016/j.jmb.2023.168016. Epub 2023 Feb 16.