• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

滤除噪音:宏基因组分类器优化古DNA图谱绘制。

Filtering out the noise: metagenomic classifiers optimize ancient DNA mapping.

作者信息

Ravishankar Shyamsundar, Perez Vilma, Davidson Roberta, Roca-Rada Xavier, Lan Divon, Souilmi Yassine, Llamas Bastien

机构信息

Australian Centre for Ancient DNA (ACAD) and The Environment Institute, The School of Biological Sciences, University of Adelaide, Adelaide, SA, Australia.

Centre of Excellence for Australian Biodiversity and Heritage, University of Adelaide, Adelaide, SA, Australia.

出版信息

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae646.

DOI:10.1093/bib/bbae646
PMID:39674265
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11646131/
Abstract

Contamination with exogenous DNA presents a significant challenge in ancient DNA (aDNA) studies of single organisms. Failure to address contamination from microbes, reagents, and present-day sources can impact the interpretation of results. Although field and laboratory protocols exist to limit contamination, there is still a need to accurately distinguish between endogenous and exogenous data computationally. Here, we propose a workflow to reduce exogenous contamination based on a metagenomic classifier. Unlike previous methods that relied exclusively on DNA sequencing reads mapping specificity to a single reference genome to remove contaminating reads, our approach uses Kraken2-based filtering before mapping to the reference genome. Using both simulated and empirical shotgun aDNA data, we show that this workflow presents a simple and efficient method that can be used in a wide range of computational environments-including personal machines. We propose strategies to build specific databases used to profile sequencing data that take into consideration available computational resources and prior knowledge about the target taxa and likely contaminants. Our workflow significantly reduces the overall computational resources required during the mapping process and reduces the total runtime by up to ~94%. The most significant impacts are observed in low endogenous samples. Importantly, contaminants that would map to the reference are filtered out using our strategy, reducing false positive alignments. We also show that our method results in a negligible loss of endogenous data with no measurable impact on downstream population genetics analyses.

摘要

外源性DNA污染在单个生物体的古DNA(aDNA)研究中是一个重大挑战。未能解决来自微生物、试剂和现代来源的污染会影响结果的解释。尽管存在野外和实验室规程来限制污染,但仍需要通过计算准确区分内源性和外源性数据。在此,我们提出一种基于宏基因组分类器的减少外源性污染的工作流程。与以往仅依靠DNA测序读数对单个参考基因组的映射特异性来去除污染读数的方法不同,我们的方法在映射到参考基因组之前使用基于Kraken2的过滤。使用模拟和经验性鸟枪法aDNA数据,我们表明该工作流程提供了一种简单有效的方法,可用于包括个人计算机在内的广泛计算环境。我们提出了构建用于分析测序数据的特定数据库的策略,该策略考虑了可用的计算资源以及关于目标分类群和可能污染物的先验知识。我们的工作流程显著减少了映射过程中所需的总体计算资源,并将总运行时间减少了高达约94%。在低内源性样本中观察到最显著的影响。重要的是,使用我们的策略可以滤除会映射到参考基因组的污染物,减少假阳性比对。我们还表明,我们的方法导致内源性数据的损失可忽略不计,对下游群体遗传学分析没有可测量的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47a3/11646131/17fe69e7470e/bbae646f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47a3/11646131/faafec68232d/bbae646ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47a3/11646131/b195321dc309/bbae646f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47a3/11646131/3c028639a024/bbae646f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47a3/11646131/30067ee728ba/bbae646f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47a3/11646131/17fe69e7470e/bbae646f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47a3/11646131/faafec68232d/bbae646ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47a3/11646131/b195321dc309/bbae646f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47a3/11646131/3c028639a024/bbae646f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47a3/11646131/30067ee728ba/bbae646f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47a3/11646131/17fe69e7470e/bbae646f4.jpg

相似文献

1
Filtering out the noise: metagenomic classifiers optimize ancient DNA mapping.滤除噪音:宏基因组分类器优化古DNA图谱绘制。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae646.
2
Systematic benchmark of ancient DNA read mapping.系统评估古 DNA 读段映射。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab076.
3
Benchmarking metagenomics classifiers on ancient viral DNA: a simulation study.基于古代病毒 DNA 的宏基因组分类器基准测试:一项模拟研究。
PeerJ. 2022 Mar 24;10:e12784. doi: 10.7717/peerj.12784. eCollection 2022.
4
Improving ancient DNA read mapping against modern reference genomes.提高古代 DNA 读取与现代参考基因组的比对。
BMC Genomics. 2012 May 10;13:178. doi: 10.1186/1471-2164-13-178.
5
HAYSTAC: A Bayesian framework for robust and rapid species identification in high-throughput sequencing data.HAYSTAC:一种用于高通量测序数据中稳健快速物种鉴定的贝叶斯框架。
PLoS Comput Biol. 2022 Sep 30;18(9):e1010493. doi: 10.1371/journal.pcbi.1010493. eCollection 2022 Sep.
6
Drastic reduction of false positive species in samples of insects by intersecting the default output of two popular metagenomic classifiers.通过交叉比较两种流行的宏基因组分类器的默认输出,大幅减少昆虫样本中的假阳性物种。
PLoS One. 2022 Oct 25;17(10):e0275790. doi: 10.1371/journal.pone.0275790. eCollection 2022.
7
Competitive mapping allows for the identification and exclusion of human DNA contamination in ancient faunal genomic datasets.竞争作图可用于鉴定和排除古代动物基因组数据集中的人类 DNA 污染。
BMC Genomics. 2020 Nov 30;21(1):844. doi: 10.1186/s12864-020-07229-y.
8
Facilitating accessible, rapid, and appropriate processing of ancient metagenomic data with AMDirT.使用 AMDirT 促进古代宏基因组数据的可访问、快速和适当处理。
F1000Res. 2024 May 28;12:926. doi: 10.12688/f1000research.134798.2. eCollection 2023.
9
The presence and impact of reference bias on population genomic studies of prehistoric human populations.史前人类群体的种群基因组研究中参考偏倚的存在和影响。
PLoS Genet. 2019 Jul 26;15(7):e1008302. doi: 10.1371/journal.pgen.1008302. eCollection 2019 Jul.
10
Recentrifuge: Robust comparative analysis and contamination removal for metagenomics.Recentrifuge:用于宏基因组学的稳健比较分析和污染去除。
PLoS Comput Biol. 2019 Apr 8;15(4):e1006967. doi: 10.1371/journal.pcbi.1006967. eCollection 2019 Apr.

本文引用的文献

1
Pangenome databases improve host removal and mycobacteria classification from clinical metagenomic data.泛基因组数据库可提高从临床宏基因组数据中去除宿主和分枝杆菌分类的能力。
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae010.
2
Rapid and sensitive detection of genome contamination at scale with FCS-GX.使用 FCS-GX 实现大规模的基因组污染快速灵敏检测。
Genome Biol. 2024 Feb 26;25(1):60. doi: 10.1186/s13059-024-03198-7.
3
Comparative analysis of metagenomic classifiers for long-read sequencing datasets.长读测序数据集的宏基因组分类器的比较分析。
BMC Bioinformatics. 2024 Jan 11;25(1):15. doi: 10.1186/s12859-024-05634-8.
4
Benchmarking Metagenomic Classifiers on Simulated Ancient and Modern Metagenomic Data.基于模拟的古代和现代宏基因组数据对宏基因组分类器进行基准测试。
Microorganisms. 2023 Oct 2;11(10):2478. doi: 10.3390/microorganisms11102478.
5
Metagenomic profiling pipelines improve taxonomic classification for 16S amplicon sequencing data.宏基因组分析流程可改善 16S 扩增子测序数据的分类学结果。
Sci Rep. 2023 Aug 26;13(1):13957. doi: 10.1038/s41598-023-40799-x.
6
Three assays for in-solution enrichment of ancient human DNA at more than a million SNPs.三种在溶液中富集古人类 DNA 超过一百万 SNPs 的方法。
Genome Res. 2022 Nov-Dec;32(11-12):2068-2078. doi: 10.1101/gr.276728.122. Epub 2022 Dec 14.
7
Grey wolf genomic history reveals a dual ancestry of dogs.灰狼基因组历史揭示了狗的双重起源。
Nature. 2022 Jul;607(7918):313-320. doi: 10.1038/s41586-022-04824-9. Epub 2022 Jun 29.
8
Benchmarking metagenomics classifiers on ancient viral DNA: a simulation study.基于古代病毒 DNA 的宏基因组分类器基准测试:一项模拟研究。
PeerJ. 2022 Mar 24;10:e12784. doi: 10.7717/peerj.12784. eCollection 2022.
9
nf-core/mag: a best-practice pipeline for metagenome hybrid assembly and binning.nf-core/mag:宏基因组混合组装与分箱的最佳实践流程。
NAR Genom Bioinform. 2022 Feb 2;4(1):lqac007. doi: 10.1093/nargab/lqac007. eCollection 2022 Mar.
10
Contamination in Reference Sequence Databases: Time for Divide-and-Rule Tactics.参考序列数据库中的污染:是时候采取分而治之的策略了。
Front Microbiol. 2021 Oct 22;12:755101. doi: 10.3389/fmicb.2021.755101. eCollection 2021.