• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

OptiFit:一种改进的扩增子序列与现有 OTU 拟合方法。

OptiFit: an Improved Method for Fitting Amplicon Sequences to Existing OTUs.

机构信息

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA.

Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, USA.

出版信息

mSphere. 2022 Feb 23;7(1):e0091621. doi: 10.1128/msphere.00916-21. Epub 2022 Feb 2.

DOI:10.1128/msphere.00916-21
PMID:35107341
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8809378/
Abstract

Assigning amplicon sequences to operational taxonomic units (OTUs) is an important step in characterizing microbial communities across large data sets. A notable difference between clustering and database-dependent reference clustering methods is that OTU assignments from methods may change when new sequences are added. However, one may wish to incorporate new samples to previously clustered data sets without clustering all sequences again, such as when comparing across data sets or deploying machine learning models. Existing reference-based methods produce consistent OTUs but only consider the similarity of each query sequence to a single reference sequence in an OTU, resulting in assignments that are worse than those generated by methods. To provide an efficient method to fit sequences to existing OTUs, we developed the OptiFit algorithm. Inspired by the OptiClust algorithm, OptiFit considers the similarity of all pairs of reference and query sequences to produce OTUs of the best possible quality. We tested OptiFit using four data sets with two strategies: (i) clustering to a reference database and (ii) splitting the data set into a reference and query set, clustering the references using OptiClust, and then clustering the queries to the references. The result is an improved implementation of reference-based clustering. OptiFit produces OTUs of a quality similar to that of OptiClust at faster speeds when using the split data set strategy. OptiFit provides a suitable option for users requiring consistent OTU assignments at the same quality as afforded by clustering methods. Advancements in DNA sequencing technology have allowed researchers to affordably generate millions of sequence reads from microorganisms in diverse environments. Efficient and robust software tools are needed to assign microbial sequences into taxonomic groups for characterization and comparison of communities. The OptiClust algorithm produces high-quality groups by comparing sequences to each other, but the assignments can change when new sequences are added to a data set, making it difficult to compare different studies. Other approaches assign sequences to groups by comparing them to sequences in a reference database to produce consistent assignments, but the quality of the groups produced is reduced compared to that with OptiClust. We developed OptiFit, a new reference-based algorithm that produces consistent yet high-quality assignments like OptiClust. OptiFit allows researchers to compare microbial communities across different studies or add new data to existing studies without sacrificing the quality of the group assignments.

摘要

将扩增子序列分配给操作分类单元 (OTU) 是对大量数据集进行微生物群落特征描述的重要步骤。聚类和基于数据库的参考聚类方法之间的一个显著区别是,使用方法进行的 OTU 分配可能会随着新序列的添加而发生变化。然而,人们可能希望在不再次对所有序列进行聚类的情况下将新样本纳入先前聚类的数据集中,例如在比较数据集或部署机器学习模型时。现有的基于参考的方法可生成一致的 OTU,但仅考虑每个查询序列与 OTU 中单个参考序列的相似性,导致分配结果不如方法生成的结果好。为了提供一种将序列适配到现有 OTU 的有效方法,我们开发了 OptiFit 算法。受 OptiClust 算法的启发,OptiFit 考虑了所有参考序列和查询序列对的相似性,以生成质量尽可能好的 OTU。我们使用四个数据集并采用两种策略来测试 OptiFit:(i) 聚类到参考数据库和 (ii) 将数据集分为参考集和查询集,使用 OptiClust 对参考进行聚类,然后将查询聚类到参考。结果是对基于参考的聚类的一种改进实现。当使用分割数据集策略时,OptiFit 以更快的速度生成与 OptiClust 相似质量的 OTU。OptiFit 为需要与方法提供的聚类方法一样质量的一致 OTU 分配的用户提供了一个合适的选择。

DNA 测序技术的进步使得研究人员能够以较低的成本从不同环境中的微生物中生成数百万个序列读取。需要高效且强大的软件工具将微生物序列分配到分类群中,以对群落进行特征描述和比较。OptiClust 算法通过相互比较序列来生成高质量的群组,但当向数据集添加新序列时,分配可能会发生变化,从而难以比较不同的研究。其他方法通过将序列与参考数据库中的序列进行比较来将序列分配到组中,从而产生一致的分配,但与 OptiClust 相比,所产生的组的质量会降低。我们开发了 OptiFit,这是一种新的基于参考的算法,它可以像 OptiClust 一样生成一致且高质量的分配。OptiFit 允许研究人员在不牺牲组分配质量的情况下比较不同研究中的微生物群落或向现有研究添加新数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c016/8809378/6a410f31ee46/msphere.00916-21-f004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c016/8809378/d8d862605da6/msphere.00916-21-f001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c016/8809378/5f4e670e9cf1/msphere.00916-21-f002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c016/8809378/c4e6e511fb47/msphere.00916-21-f003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c016/8809378/6a410f31ee46/msphere.00916-21-f004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c016/8809378/d8d862605da6/msphere.00916-21-f001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c016/8809378/5f4e670e9cf1/msphere.00916-21-f002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c016/8809378/c4e6e511fb47/msphere.00916-21-f003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c016/8809378/6a410f31ee46/msphere.00916-21-f004.jpg

相似文献

1
OptiFit: an Improved Method for Fitting Amplicon Sequences to Existing OTUs.OptiFit:一种改进的扩增子序列与现有 OTU 拟合方法。
mSphere. 2022 Feb 23;7(1):e0091621. doi: 10.1128/msphere.00916-21. Epub 2022 Feb 2.
2
Machine learning classification by fitting amplicon sequences to existing OTUs.通过将扩增子序列拟合到现有 OTUs 来进行机器学习分类。
mSphere. 2023 Oct 24;8(5):e0033623. doi: 10.1128/msphere.00336-23. Epub 2023 Aug 24.
3
OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units.OptiClust,一种将基于扩增子的序列数据分配到操作分类单元的改进方法。
mSphere. 2017 Mar 8;2(2). doi: 10.1128/mSphereDirect.00073-17. eCollection 2017 Mar-Apr.
4
De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units.在将16S rRNA基因序列分配到操作分类单元方面,从头聚类方法优于基于参考的方法。
PeerJ. 2015 Dec 8;3:e1487. doi: 10.7717/peerj.1487. eCollection 2015.
5
A De Novo Robust Clustering Approach for Amplicon-Based Sequence Data.一种基于扩增子序列数据的全新稳健聚类方法。
J Comput Biol. 2019 Jun;26(6):618-624. doi: 10.1089/cmb.2018.0170. Epub 2018 Dec 5.
6
bioOTU: An Improved Method for Simultaneous Taxonomic Assignments and Operational Taxonomic Units Clustering of 16s rRNA Gene Sequences.生物OTU:一种用于16S rRNA基因序列分类分配和操作分类单元聚类的改进方法。
J Comput Biol. 2016 Apr;23(4):229-38. doi: 10.1089/cmb.2015.0214. Epub 2016 Mar 7.
7
TaxAss: Leveraging a Custom Freshwater Database Achieves Fine-Scale Taxonomic Resolution.TaxAss:利用自定义淡水数据库实现精细分类学分辨率。
mSphere. 2018 Sep 5;3(5):e00327-18. doi: 10.1128/mSphere.00327-18.
8
DBH: A de Bruijn graph-based heuristic method for clustering large-scale 16S rRNA sequences into OTUs.DBH:一种基于德布鲁因图的启发式方法,用于将大规模16S rRNA序列聚类为操作分类单元。
J Theor Biol. 2017 Jul 21;425:80-87. doi: 10.1016/j.jtbi.2017.04.019. Epub 2017 Apr 26.
9
Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences.Piphillin 可根据 DADA2 校正的 16S rDNA 序列预测宏基因组组成和动态。
BMC Genomics. 2020 Jan 17;21(1):56. doi: 10.1186/s12864-019-6427-1.
10
Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis.评估和改进基于操作分类单元的 16S rRNA 基因序列分析方法。
Appl Environ Microbiol. 2011 May;77(10):3219-26. doi: 10.1128/AEM.02810-10. Epub 2011 Mar 18.

引用本文的文献

1
Seasonal Cycles in a Seaweed Holobiont: A Multiyear Time Series Reveals Repetitive Microbial Shifts and Core Taxa.海藻共生体中的季节性循环:多年时间序列揭示了重复性的微生物变化和核心分类群。
Environ Microbiol. 2025 Mar;27(3):e70062. doi: 10.1111/1462-2920.70062.
2
Effects of Cadmium Stress on Bacterial and Fungal Communities in the Whitefly .镉胁迫对烟粉虱体内细菌和真菌群落的影响。
Int J Mol Sci. 2023 Sep 2;24(17):13588. doi: 10.3390/ijms241713588.
3
Machine learning classification by fitting amplicon sequences to existing OTUs.通过将扩增子序列拟合到现有 OTUs 来进行机器学习分类。

本文引用的文献

1
Array programming with NumPy.使用 NumPy 进行数组编程。
Nature. 2020 Sep;585(7825):357-362. doi: 10.1038/s41586-020-2649-2. Epub 2020 Sep 16.
2
Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2.使用QIIME 2进行可重复、交互式、可扩展和可延伸的微生物组数据科学研究。
Nat Biotechnol. 2019 Aug;37(8):852-857. doi: 10.1038/s41587-019-0209-9.
3
OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units.OptiClust,一种将基于扩增子的序列数据分配到操作分类单元的改进方法。
mSphere. 2023 Oct 24;8(5):e0033623. doi: 10.1128/msphere.00336-23. Epub 2023 Aug 24.
mSphere. 2017 Mar 8;2(2). doi: 10.1128/mSphereDirect.00073-17. eCollection 2017 Mar-Apr.
4
Application of a Database-Independent Approach To Assess the Quality of Operational Taxonomic Unit Picking Methods.一种独立于数据库的方法在评估操作分类单元划分方法质量中的应用。
mSystems. 2016 Apr 26;1(2). doi: 10.1128/mSystems.00027-16. eCollection 2016 Mar-Apr.
5
VSEARCH: a versatile open source tool for metagenomics.VSEARCH:一款用于宏基因组学的多功能开源工具。
PeerJ. 2016 Oct 18;4:e2584. doi: 10.7717/peerj.2584. eCollection 2016.
6
Artificial Seawater Media Facilitate Cultivating Members of the Microbial Majority from the Gulf of Mexico.人工海水培养基有助于培养来自墨西哥湾的大多数微生物成员。
mSphere. 2016 Apr 27;1(2). doi: 10.1128/mSphere.00028-16. eCollection 2016 Mar-Apr.
7
Metagenomics Reveals Pervasive Bacterial Populations and Reduced Community Diversity across the Alaska Tundra Ecosystem.宏基因组学揭示了阿拉斯加苔原生态系统中普遍存在的细菌种群以及群落多样性的降低。
Front Microbiol. 2016 Apr 25;7:579. doi: 10.3389/fmicb.2016.00579. eCollection 2016.
8
Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions.基于微生物群的模型提高了粪便免疫化学检测结肠病变的灵敏度。
Genome Med. 2016 Apr 6;8(1):37. doi: 10.1186/s13073-016-0290-3.
9
De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units.在将16S rRNA基因序列分配到操作分类单元方面,从头聚类方法优于基于参考的方法。
PeerJ. 2015 Dec 8;3:e1487. doi: 10.7717/peerj.1487. eCollection 2015.
10
Ribosomal Database Project: data and tools for high throughput rRNA analysis.核糖体数据库项目:高通量 rRNA 分析的数据和工具。
Nucleic Acids Res. 2014 Jan;42(Database issue):D633-42. doi: 10.1093/nar/gkt1244. Epub 2013 Nov 27.