• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PhyloPythiaS+:一种用于从宏基因组快速重建低等级分类单元的自训练方法。

PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes.

作者信息

Gregor Ivan, Dröge Johannes, Schirmer Melanie, Quince Christopher, McHardy Alice C

机构信息

Max-Planck Research Group for Computational Genomics and Epidemiology, Max-Planck Institute for Informatics, Saarbrücken, Germany; Department of Algorithmic Bioinformatics, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany; Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany.

The Broad Institute of MIT and Harvard , Cambridge, MA , United States.

出版信息

PeerJ. 2016 Feb 8;4:e1603. doi: 10.7717/peerj.1603. eCollection 2016.

DOI:10.7717/peerj.1603
PMID:26870609
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4748697/
Abstract

Background. Metagenomics is an approach for characterizing environmental microbial communities in situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. This is often achieved by a combination of sequence assembly and binning, where sequences are grouped into 'bins' representing taxa of the underlying microbial community. Assignment to low-ranking taxonomic bins is an important challenge for binning methods as is scalability to Gb-sized datasets generated with deep sequencing techniques. One of the best available methods for species bins recovery from deep-branching phyla is the expert-trained PhyloPythiaS package, where a human expert decides on the taxa to incorporate in the model and identifies 'training' sequences based on marker genes directly from the sample. Due to the manual effort involved, this approach does not scale to multiple metagenome samples and requires substantial expertise, which researchers who are new to the area do not have. Results. We have developed PhyloPythiaS+, a successor to our PhyloPythia(S) software. The new (+) component performs the work previously done by the human expert. PhyloPythiaS+ also includes a new k-mer counting algorithm, which accelerated the simultaneous counting of 4-6-mers used for taxonomic binning 100-fold and reduced the overall execution time of the software by a factor of three. Our software allows to analyze Gb-sized metagenomes with inexpensive hardware, and to recover species or genera-level bins with low error rates in a fully automated fashion. PhyloPythiaS+ was compared to MEGAN, taxator-tk, Kraken and the generic PhyloPythiaS model. The results showed that PhyloPythiaS+ performs especially well for samples originating from novel environments in comparison to the other methods. Availability. PhyloPythiaS+ in a virtual machine is available for installation under Windows, Unix systems or OS X on: https://github.com/algbioi/ppsp/wiki.

摘要

背景。宏基因组学是一种原位表征环境微生物群落的方法,它能够对微生物群落进行功能和分类特征分析,并从未培养的分类群中恢复序列。这通常通过序列组装和分箱相结合来实现,即将序列分组到代表潜在微生物群落分类群的“箱”中。对于分箱方法而言,将序列分配到低等级分类箱以及扩展到由深度测序技术生成的千兆字节大小数据集都是一项重大挑战。从深分支门类中恢复物种箱的最佳可用方法之一是经过专家训练的PhyloPythiaS软件包,其中由人类专家决定纳入模型的分类群,并基于直接来自样本的标记基因识别“训练”序列。由于涉及人工操作,这种方法无法扩展到多个宏基因组样本,并且需要大量专业知识,而该领域的新手研究人员并不具备这些知识。结果。我们开发了PhyloPythiaS+,它是我们的PhyloPythia(S)软件的后续版本。新的(+)组件执行了之前由人类专家完成的工作。PhyloPythiaS+还包括一种新的k-mer计数算法,该算法将用于分类分箱的4至6-mer的同时计数速度提高了100倍,并将软件的总体执行时间缩短了三分之一。我们的软件允许使用低成本硬件分析千兆字节大小的宏基因组,并以完全自动化的方式以低错误率恢复物种或属水平的箱。将PhyloPythiaS+与MEGAN、taxator-tk、Kraken和通用的PhyloPythiaS模型进行了比较。结果表明,与其他方法相比,PhyloPythiaS+对于源自新环境的样本表现尤其出色。可用性。可在以下网址获取虚拟机中的PhyloPythiaS+,以便在Windows、Unix系统或OS X上安装:https://github.com/algbioi/ppsp/wiki。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3617/4748697/ee62cb6b433c/peerj-04-1603-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3617/4748697/1fabeb0856d0/peerj-04-1603-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3617/4748697/9793b4e97138/peerj-04-1603-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3617/4748697/34dfd0fbea0e/peerj-04-1603-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3617/4748697/ee62cb6b433c/peerj-04-1603-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3617/4748697/1fabeb0856d0/peerj-04-1603-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3617/4748697/9793b4e97138/peerj-04-1603-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3617/4748697/34dfd0fbea0e/peerj-04-1603-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3617/4748697/ee62cb6b433c/peerj-04-1603-g004.jpg

相似文献

1
PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes.PhyloPythiaS+:一种用于从宏基因组快速重建低等级分类单元的自训练方法。
PeerJ. 2016 Feb 8;4:e1603. doi: 10.7717/peerj.1603. eCollection 2016.
2
Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods.Taxator-tk:通过快速近似进化邻域对宏基因组进行精确的分类学归属
Bioinformatics. 2015 Mar 15;31(6):817-24. doi: 10.1093/bioinformatics/btu745. Epub 2014 Nov 10.
3
Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes.优化和评估宏基因组组装微生物基因组的重建。
BMC Genomics. 2017 Nov 28;18(1):915. doi: 10.1186/s12864-017-4294-1.
4
The PhyloPythiaS web server for taxonomic assignment of metagenome sequences.PhyloPythiaS 网页服务器,用于对宏基因组序列进行分类学分配。
PLoS One. 2012;7(6):e38581. doi: 10.1371/journal.pone.0038581. Epub 2012 Jun 20.
5
Large-scale machine learning for metagenomics sequence classification.用于宏基因组学序列分类的大规模机器学习
Bioinformatics. 2016 Apr 1;32(7):1023-32. doi: 10.1093/bioinformatics/btv683. Epub 2015 Nov 20.
6
ICoVeR - an interactive visualization tool for verification and refinement of metagenomic bins.ICoVeR - 一种用于宏基因组分箱验证与优化的交互式可视化工具。
BMC Bioinformatics. 2017 May 2;18(1):233. doi: 10.1186/s12859-017-1653-5.
7
CoCoNet: an efficient deep learning tool for viral metagenome binning.CoCoNet:一种用于病毒宏基因组分箱的高效深度学习工具。
Bioinformatics. 2021 Sep 29;37(18):2803-2810. doi: 10.1093/bioinformatics/btab213.
8
CAMISIM: simulating metagenomes and microbial communities.CAMISIM:模拟宏基因组和微生物群落。
Microbiome. 2019 Feb 8;7(1):17. doi: 10.1186/s40168-019-0633-6.
9
Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets.评估宏基因组工具在真实宏基因组数据集和 CAMI 数据集上的基因组 binning 效果。
BMC Bioinformatics. 2020 Jul 28;21(1):334. doi: 10.1186/s12859-020-03667-3.
10
Compact and evenly distributed k-mer binning for genomic sequences.用于基因组序列的紧凑且均匀分布的k-mer分箱
Bioinformatics. 2021 Sep 9;37(17):2563-2569. doi: 10.1093/bioinformatics/btab156.

引用本文的文献

1
The application of machine learning in clinical microbiology and infectious diseases.机器学习在临床微生物学和传染病中的应用。
Front Cell Infect Microbiol. 2025 May 1;15:1545646. doi: 10.3389/fcimb.2025.1545646. eCollection 2025.
2
A review of neural networks for metagenomic binning.宏基因组分箱的神经网络综述。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf065.
3
Step-by-Step Metagenomics for Food Microbiome Analysis: A Detailed Review.用于食品微生物组分析的逐步宏基因组学:详细综述

本文引用的文献

1
CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers.克拉克:使用判别性k-mer对宏基因组和基因组序列进行快速准确分类
BMC Genomics. 2015 Mar 25;16(1):236. doi: 10.1186/s12864-015-1419-2.
2
KMC 2: fast and resource-frugal k-mer counting.KMC 2:快速且资源节约型的k-mer计数法
Bioinformatics. 2015 May 15;31(10):1569-76. doi: 10.1093/bioinformatics/btv022. Epub 2015 Jan 20.
3
Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods.
Foods. 2024 Jul 14;13(14):2216. doi: 10.3390/foods13142216.
4
An enterococcal phage-derived enzyme suppresses graft-versus-host disease.一种源自肠球菌噬菌体的酶可抑制移植物抗宿主病。
Nature. 2024 Aug;632(8023):174-181. doi: 10.1038/s41586-024-07667-8. Epub 2024 Jul 10.
5
MetageNN: a memory-efficient neural network taxonomic classifier robust to sequencing errors and missing genomes.MetageNN:一种内存高效的神经网络分类器,可稳健应对测序错误和缺失基因组。
BMC Bioinformatics. 2024 Apr 16;25(Suppl 1):153. doi: 10.1186/s12859-024-05760-3.
6
The evolution of bacterial genome assemblies - where do we need to go next?细菌基因组组装的演变——我们接下来需要走向何方?
Microbiome Res Rep. 2022 Apr 12;1(3):15. doi: 10.20517/mrr.2022.02. eCollection 2022.
7
Machine learning for microbiologists.微生物学家的机器学习。
Nat Rev Microbiol. 2024 Apr;22(4):191-205. doi: 10.1038/s41579-023-00984-1. Epub 2023 Nov 15.
8
Influence of soil nutrients on the presence and distribution of CPR bacteria in a long-term crop rotation experiment.长期轮作试验中土壤养分对 CPR 细菌存在及分布的影响
Front Microbiol. 2023 Jul 27;14:1114548. doi: 10.3389/fmicb.2023.1114548. eCollection 2023.
9
Zero-shot-capable identification of phage-host relationships with whole-genome sequence representation by contrastive learning.通过对比学习,使用全基因组序列表示实现零样本噬菌体-宿主关系识别。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad239.
10
Potential Use of Microbial Community Genomes in Various Dimensions of Agriculture Productivity and Its Management: A Review.微生物群落基因组在农业生产力及其管理各方面的潜在应用:综述
Front Microbiol. 2022 May 17;13:708335. doi: 10.3389/fmicb.2022.708335. eCollection 2022.
Taxator-tk:通过快速近似进化邻域对宏基因组进行精确的分类学归属
Bioinformatics. 2015 Mar 15;31(6):817-24. doi: 10.1093/bioinformatics/btu745. Epub 2014 Nov 10.
4
FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares.聚焦:一种基于非负最小二乘法的无需对齐的宏基因组中生物分类模型。
PeerJ. 2014 Jun 5;2:e425. doi: 10.7717/peerj.425. eCollection 2014.
5
KAnalyze: a fast versatile pipelined k-mer toolkit.KAnalyze:一个快速通用的流水线 k-mer 工具包。
Bioinformatics. 2014 Jul 15;30(14):2070-2. doi: 10.1093/bioinformatics/btu152. Epub 2014 Mar 18.
6
Turtle: identifying frequent k-mers with cache-efficient algorithms.海龟:使用缓存高效算法识别频繁的 k-mer。
Bioinformatics. 2014 Jul 15;30(14):1950-7. doi: 10.1093/bioinformatics/btu132. Epub 2014 Mar 10.
7
Kraken: ultrafast metagenomic sequence classification using exact alignments.克拉肯:使用精确比对的超快速宏基因组序列分类
Genome Biol. 2014 Mar 3;15(3):R46. doi: 10.1186/gb-2014-15-3-r46.
8
Metagenomic species profiling using universal phylogenetic marker genes.基于通用系统发育标记基因的宏基因组物种分析。
Nat Methods. 2013 Dec;10(12):1196-9. doi: 10.1038/nmeth.2693. Epub 2013 Oct 20.
9
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.非杂交、基于长读长 SMRT 测序数据的完成微生物基因组组装。
Nat Methods. 2013 Jun;10(6):563-9. doi: 10.1038/nmeth.2474. Epub 2013 May 5.
10
The microbiome explored: recent insights and future challenges.微生物组探索:最新见解与未来挑战。
Nat Rev Microbiol. 2013 Mar;11(3):213-7. doi: 10.1038/nrmicro2973. Epub 2013 Feb 4.