• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

YACHT:一种基于ANI 的统计测试,用于检测宏基因组样本中的微生物存在/缺失。

YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample.

机构信息

Department of Computer Science and Engineering, Pennsylvania State University, State College, PA 16802, United States.

Department of Biology, Pennsylvania State University, State College, PA 16802, United States.

出版信息

Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae047.

DOI:10.1093/bioinformatics/btae047
PMID:38268451
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10868342/
Abstract

MOTIVATION

In metagenomics, the study of environmentally associated microbial communities from their sampled DNA, one of the most fundamental computational tasks is that of determining which genomes from a reference database are present or absent in a given sample metagenome. Existing tools generally return point estimates, with no associated confidence or uncertainty associated with it. This has led to practitioners experiencing difficulty when interpreting the results from these tools, particularly for low-abundance organisms as these often reside in the "noisy tail" of incorrect predictions. Furthermore, few tools account for the fact that reference databases are often incomplete and rarely, if ever, contain exact replicas of genomes present in an environmentally derived metagenome.

RESULTS

We present solutions for these issues by introducing the algorithm YACHT: Yes/No Answers to Community membership via Hypothesis Testing. This approach introduces a statistical framework that accounts for sequence divergence between the reference and sample genomes, in terms of ANI, as well as incomplete sequencing depth, thus providing a hypothesis test for determining the presence or absence of a reference genome in a sample. After introducing our approach, we quantify its statistical power and how this changes with varying parameters. Subsequently, we perform extensive experiments using both simulated and real data to confirm the accuracy and scalability of this approach.

AVAILABILITY AND IMPLEMENTATION

The source code implementing this approach is available via Conda and at https://github.com/KoslickiLab/YACHT. We also provide the code for reproducing experiments at https://github.com/KoslickiLab/YACHT-reproducibles.

摘要

动机

在宏基因组学中,研究从环境样本中提取的微生物群落的 DNA,其中最基本的计算任务之一是确定给定样本宏基因组中来自参考数据库的哪些基因组存在或不存在。现有的工具通常返回点估计值,没有与之相关的置信度或不确定性。这导致从业者在解释这些工具的结果时遇到困难,特别是对于低丰度生物,因为它们通常存在于错误预测的“噪声尾部”中。此外,很少有工具考虑到参考数据库通常不完整,并且很少(如果有的话)包含环境衍生宏基因组中存在的基因组的精确副本。

结果

我们通过引入算法 YACHT 来解决这些问题:通过假设检验来回答社区成员身份的是/否。该方法引入了一个统计框架,根据 ANI 考虑参考和样本基因组之间的序列差异,以及不完全的测序深度,从而提供了一个用于确定参考基因组在样本中是否存在的假设检验。在介绍我们的方法之后,我们量化了它的统计能力以及随着参数变化而如何变化。随后,我们使用模拟和真实数据进行了广泛的实验,以确认该方法的准确性和可扩展性。

可用性和实现

该方法的源代码可通过 Conda 获得,并可在 https://github.com/KoslickiLab/YACHT 上找到。我们还在 https://github.com/KoslickiLab/YACHT-reproducibles 上提供了重现实验的代码。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3673/10868342/1149049b85a6/btae047f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3673/10868342/e36944c22d04/btae047f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3673/10868342/48af56e901ad/btae047f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3673/10868342/342d6dc4ff0c/btae047f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3673/10868342/fb4e98c80b27/btae047f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3673/10868342/1149049b85a6/btae047f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3673/10868342/e36944c22d04/btae047f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3673/10868342/48af56e901ad/btae047f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3673/10868342/342d6dc4ff0c/btae047f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3673/10868342/fb4e98c80b27/btae047f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3673/10868342/1149049b85a6/btae047f5.jpg

相似文献

1
YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample.YACHT:一种基于ANI 的统计测试,用于检测宏基因组样本中的微生物存在/缺失。
Bioinformatics. 2024 Feb 1;40(2). doi: 10.1093/bioinformatics/btae047.
2
YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample.YACHT:一种基于ANI的统计测试,用于检测宏基因组样本中微生物的存在与否。
bioRxiv. 2023 Apr 20:2023.04.18.537298. doi: 10.1101/2023.04.18.537298.
3
MetaSort untangles metagenome assembly by reducing microbial community complexity.MetaSort 通过降低微生物群落复杂性来解开宏基因组组装难题。
Nat Commun. 2017 Jan 23;8:14306. doi: 10.1038/ncomms14306.
4
CoCoNet: an efficient deep learning tool for viral metagenome binning.CoCoNet:一种用于病毒宏基因组分箱的高效深度学习工具。
Bioinformatics. 2021 Sep 29;37(18):2803-2810. doi: 10.1093/bioinformatics/btab213.
5
ViraPipe: scalable parallel pipeline for viral metagenome analysis from next generation sequencing reads.ViraPipe:用于从下一代测序读取中进行病毒宏基因组分析的可扩展并行管道。
Bioinformatics. 2018 Mar 15;34(6):928-935. doi: 10.1093/bioinformatics/btx702.
6
Metagenomic functional profiling: to sketch or not to sketch?宏基因组功能谱分析:描绘还是不描绘?
Bioinformatics. 2024 Sep 1;40(Suppl 2):ii165-ii173. doi: 10.1093/bioinformatics/btae397.
7
CAMISIM: simulating metagenomes and microbial communities.CAMISIM:模拟宏基因组和微生物群落。
Microbiome. 2019 Feb 8;7(1):17. doi: 10.1186/s40168-019-0633-6.
8
Recovery of strain-resolved genomes from human microbiome through an integration framework of single-cell genomics and metagenomics.通过单细胞基因组学和宏基因组学的整合框架从人类微生物组中恢复菌株解析基因组。
Microbiome. 2021 Oct 12;9(1):202. doi: 10.1186/s40168-021-01152-4.
9
Tamock: simulation of habitat-specific benchmark data in metagenomics.Tamock:宏基因组学中栖息地特异性基准数据的模拟。
BMC Bioinformatics. 2021 May 1;22(1):227. doi: 10.1186/s12859-021-04154-z.
10
COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.认知器:宏基因组数据集功能注释框架
PLoS One. 2015 Nov 11;10(11):e0142102. doi: 10.1371/journal.pone.0142102. eCollection 2015.

引用本文的文献

1
StrainR2 accurately deconvolutes strain-level abundances in synthetic microbial communities.StrainR2能够准确地解析合成微生物群落中菌株水平的丰度。
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf440.
2
Analysis of metagenomic data.宏基因组数据的分析
Nat Rev Methods Primers. 2025;5. doi: 10.1038/s43586-024-00376-6. Epub 2025 Jan 23.
3
EvANI benchmarking workflow for evolutionary distance estimation.用于进化距离估计的EvANI基准测试工作流程。

本文引用的文献

1
Deriving confidence intervals for mutation rates across a wide range of evolutionary distances using FracMinHash.使用 FracMinHash 在广泛的进化距离范围内推导突变率的置信区间。
Genome Res. 2023 Jul;33(7):1061-1068. doi: 10.1101/gr.277651.123. Epub 2023 Jun 21.
2
Introducing the Bacterial and Viral Bioinformatics Resource Center (BV-BRC): a resource combining PATRIC, IRD and ViPR.推出细菌和病毒生物信息学资源中心(BV-BRC):一个整合 PATRIC、IRD 和 ViPR 的资源。
Nucleic Acids Res. 2023 Jan 6;51(D1):D678-D689. doi: 10.1093/nar/gkac1003.
3
Filtering ASVs/OTUs via mutual information-based microbiome network analysis.
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf267.
4
The physical biogeography of in health and disease.健康与疾病中的物理生物地理学。 (注:原句“of in health and disease”表述不完整规范,这里是根据整体意思尽量合理翻译)
mBio. 2025 Apr 9;16(4):e0298924. doi: 10.1128/mbio.02989-24. Epub 2025 Mar 10.
5
A designed synthetic microbiota provides insight to community function in Clostridioides difficile resistance.一种设计的合成微生物群有助于深入了解艰难梭菌抗性中的群落功能。
Cell Host Microbe. 2025 Mar 12;33(3):373-387.e9. doi: 10.1016/j.chom.2025.02.007. Epub 2025 Mar 3.
6
EvANI benchmarking workflow for evolutionary distance estimation.用于进化距离估计的EvANI基准测试工作流程。
bioRxiv. 2025 Feb 23:2025.02.23.639716. doi: 10.1101/2025.02.23.639716.
7
Exploration of the genetic landscape of bacterial dsDNA viruses reveals an ANI gap amid extensive mosaicism.对细菌双链DNA病毒基因图谱的探索揭示了广泛镶嵌现象中的ANI差距。
mSystems. 2025 Feb 18;10(2):e0166124. doi: 10.1128/msystems.01661-24. Epub 2025 Jan 29.
8
Antarctic Geothermal Soils Exhibit an Absence of Regional Habitat Generalist Microorganisms.南极地热土壤中缺乏区域生境通用微生物。
Environ Microbiol. 2025 Jan;27(1):e70032. doi: 10.1111/1462-2920.70032.
9
Rapid species-level metagenome profiling and containment estimation with sylph.利用Sylph进行快速的物种水平宏基因组分析和含量估计。
Nat Biotechnol. 2024 Oct 8. doi: 10.1038/s41587-024-02412-y.
10
StrainR2 accurately deconvolutes strain-level abundances in synthetic microbial communities.StrainR2能够准确地反卷积合成微生物群落中菌株水平的丰度。
bioRxiv. 2024 Aug 9:2024.08.08.607172. doi: 10.1101/2024.08.08.607172.
基于互信息的微生物组网络分析筛选 ASVs/OTUs。
BMC Bioinformatics. 2022 Sep 16;23(1):380. doi: 10.1186/s12859-022-04919-0.
4
Sequencing introduced false positive rare taxa lead to biased microbial community diversity, assembly, and interaction interpretation in amplicon studies.在扩增子研究中,测序引入的假阳性稀有分类群会导致微生物群落多样性、组装及相互作用解读出现偏差。
Environ Microbiome. 2022 Aug 17;17(1):43. doi: 10.1186/s40793-022-00436-y.
5
Critical Assessment of Metagenome Interpretation: the second round of challenges.宏基因组解读的关键评估:第二轮挑战。
Nat Methods. 2022 Apr;19(4):429-440. doi: 10.1038/s41592-022-01431-4. Epub 2022 Apr 8.
6
The Statistics of -mers from a Sequence Undergoing a Simple Mutation Process Without Spurious Matches.无伪匹配情况下简单突变过程中序列的 -mers 统计。
J Comput Biol. 2022 Feb;29(2):155-168. doi: 10.1089/cmb.2021.0431. Epub 2022 Feb 1.
7
GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy.GTDB:通过系统发生一致、等级归一化和基于完整基因组的分类学,对细菌和古菌多样性进行持续普查。
Nucleic Acids Res. 2022 Jan 7;50(D1):D785-D794. doi: 10.1093/nar/gkab776.
8
Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3.利用 bioBakery 3 整合具有分类学、功能和菌株水平特征的多样化微生物群落。
Elife. 2021 May 4;10:e65088. doi: 10.7554/eLife.65088.
9
Effects of Rare Microbiome Taxa Filtering on Statistical Analysis.稀有微生物群落分类过滤对统计分析的影响。
Front Microbiol. 2021 Jan 12;11:607325. doi: 10.3389/fmicb.2020.607325. eCollection 2020.
10
TIPP2: metagenomic taxonomic profiling using phylogenetic markers.TIPP2:使用系统发育标记进行宏基因组分类分析
Bioinformatics. 2021 Jul 27;37(13):1839-1845. doi: 10.1093/bioinformatics/btab023.