• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TACOA:使用核化最近邻方法对环境基因组片段进行分类学分类。

TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach.

作者信息

Diaz Naryttza N, Krause Lutz, Goesmann Alexander, Niehaus Karsten, Nattkemper Tim W

机构信息

Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany.

出版信息

BMC Bioinformatics. 2009 Feb 11;10:56. doi: 10.1186/1471-2105-10-56.

DOI:10.1186/1471-2105-10-56
PMID:19210774
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2653487/
Abstract

BACKGROUND

Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the k-nearest neighbor with strategies from kernel-based learning.

RESULTS

Our novel strategy was extensively evaluated using the leave-one-out cross validation strategy on fragments of variable length (800 bp - 50 Kbp) from 373 completely sequenced genomes. TACOA is able to classify genomic fragments of length 800 bp and 1 Kbp with high accuracy until rank class. For longer fragments > or = 3 Kbp accurate predictions are made at even deeper taxonomic ranks (order and genus). Remarkably, TACOA also produces reliable results when the taxonomic origin of a fragment is not represented in the reference set, thus classifying such fragments to its known broader taxonomic class or simply as "unknown". We compared the classification accuracy of TACOA with the latest intrinsic classifier PhyloPythia using 63 recently published complete genomes. For fragments of length 800 bp and 1 Kbp the overall accuracy of TACOA is higher than that obtained by PhyloPythia at all taxonomic ranks. For all fragment lengths, both methods achieved comparable high specificity results up to rank class and low false negative rates are also obtained.

CONCLUSION

An accurate multi-class taxonomic classifier was developed for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp. The proposed method is transparent, fast, accurate and the reference set can be easily updated as newly sequenced genomes become available. Moreover, the method demonstrated to be competitive when compared to the most current classifier PhyloPythia and has the advantage that it can be locally installed and the reference set can be kept up-to-date.

摘要

背景

宏基因组学,即对从环境中分离出的微生物的集体基因组(宏基因组)进行测序和分析,有望直接接触到“绝大多数不可培养的微生物”。这个新兴领域为我们理解整个生物世界奠定坚实基础提供了潜力。然而,分类学分类是宏基因组学数据集分析中的一项基本任务,目前仍远未得到解决。我们提出了一种预测环境基因组片段分类学来源的新策略。所提出的分类器将k近邻算法的思想与基于核学习的策略相结合。

结果

我们使用留一法交叉验证策略,对来自373个完全测序基因组的可变长度(800bp - 50Kbp)片段进行了广泛评估。TACOA能够以高精度将长度为800bp和1Kbp的基因组片段分类到等级分类。对于长度大于或等于3Kbp的较长片段,在更深的分类学等级(目和属)上也能做出准确预测。值得注意的是,当片段的分类学来源在参考集中未被代表时,TACOA也能产生可靠的结果,从而将此类片段分类到其已知的更宽泛的分类类别或简单地归类为“未知”。我们使用63个最近发表的完整基因组,将TACOA的分类准确性与最新的内在分类器PhyloPythia进行了比较。对于长度为800bp和1Kbp的片段,TACOA在所有分类学等级上的总体准确性都高于PhyloPythia。对于所有片段长度,两种方法在等级分类之前都获得了相当高的特异性结果,并且假阴性率也很低。

结论

开发了一种用于环境基因组片段的准确多类分类器。TACOA能够以高可靠性预测短至800bp的基因组片段的分类学来源。所提出的方法透明、快速、准确,并且随着新测序基因组的出现,参考集可以很容易地更新。此外,与当前最先进的分类器PhyloPythia相比,该方法具有竞争力,并且具有可以在本地安装且参考集可以保持最新的优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a66/2653487/f72e2b72b7e8/1471-2105-10-56-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a66/2653487/ab1fbede100b/1471-2105-10-56-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a66/2653487/99a0de11d68b/1471-2105-10-56-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a66/2653487/dc3eabac85ef/1471-2105-10-56-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a66/2653487/1bf25510acea/1471-2105-10-56-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a66/2653487/f72e2b72b7e8/1471-2105-10-56-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a66/2653487/ab1fbede100b/1471-2105-10-56-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a66/2653487/99a0de11d68b/1471-2105-10-56-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a66/2653487/dc3eabac85ef/1471-2105-10-56-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a66/2653487/1bf25510acea/1471-2105-10-56-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a66/2653487/f72e2b72b7e8/1471-2105-10-56-5.jpg

相似文献

1
TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach.TACOA:使用核化最近邻方法对环境基因组片段进行分类学分类。
BMC Bioinformatics. 2009 Feb 11;10:56. doi: 10.1186/1471-2105-10-56.
2
Classifying short genomic fragments from novel lineages using composition and homology.基于组成和同源性对新谱系的短基因组片段进行分类。
BMC Bioinformatics. 2011 Aug 9;12:328. doi: 10.1186/1471-2105-12-328.
3
RAIphy: phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles.RAIphy:基于相对丰度指数轮廓的迭代细化对宏基因组样本进行系统发育分类。
BMC Bioinformatics. 2011 Jan 31;12:41. doi: 10.1186/1471-2105-12-41.
4
Accurate phylogenetic classification of variable-length DNA fragments.可变长度DNA片段的精确系统发育分类。
Nat Methods. 2007 Jan;4(1):63-72. doi: 10.1038/nmeth976. Epub 2006 Dec 10.
5
Quantitatively Partitioning Microbial Genomic Traits among Taxonomic Ranks across the Microbial Tree of Life.定量划分生命之树上的微生物分类等级中的微生物基因组特征。
mSphere. 2019 Aug 28;4(4):e00446-19. doi: 10.1128/mSphere.00446-19.
6
MyTaxa: an advanced taxonomic classifier for genomic and metagenomic sequences.MyTaxa:一种用于基因组和宏基因组序列的高级分类学分类器。
Nucleic Acids Res. 2014 Apr;42(8):e73. doi: 10.1093/nar/gku169. Epub 2014 Mar 3.
7
SyntTax: a web server linking synteny to prokaryotic taxonomy.SyntTax:一个将基因共线性与原核生物分类学联系起来的网络服务器。
BMC Bioinformatics. 2013 Jan 16;14:4. doi: 10.1186/1471-2105-14-4.
8
ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels.ML-DSP:利用数字信号处理进行机器学习,实现了在所有分类学水平上的超快、准确和可扩展的基因组分类。
BMC Genomics. 2019 Apr 3;20(1):267. doi: 10.1186/s12864-019-5571-y.
9
Resolving prokaryotic taxonomy without rRNA: longer oligonucleotide word lengths improve genome and metagenome taxonomic classification.不依赖 rRNA 解析原核分类学:更长的寡核苷酸字长可改善基因组和宏基因组的分类学分类。
PLoS One. 2013 Jul 1;8(7):e67337. doi: 10.1371/journal.pone.0067337. Print 2013.
10
Deep learning models for bacteria taxonomic classification of metagenomic data.基于深度学习的宏基因组数据细菌分类学分类模型
BMC Bioinformatics. 2018 Jul 9;19(Suppl 7):198. doi: 10.1186/s12859-018-2182-6.

引用本文的文献

1
The application of machine learning in clinical microbiology and infectious diseases.机器学习在临床微生物学和传染病中的应用。
Front Cell Infect Microbiol. 2025 May 1;15:1545646. doi: 10.3389/fcimb.2025.1545646. eCollection 2025.
2
A survey of k-mer methods and applications in bioinformatics.生物信息学中k-mer方法及其应用综述。
Comput Struct Biotechnol J. 2024 May 21;23:2289-2303. doi: 10.1016/j.csbj.2024.05.025. eCollection 2024 Dec.
3
Machine learning for microbiologists.微生物学家的机器学习。

本文引用的文献

1
Taxonomic composition and gene content of a methane-producing microbial community isolated from a biogas reactor.从沼气反应器中分离出的产甲烷微生物群落的分类组成和基因含量。
J Biotechnol. 2008 Aug 31;136(1-2):91-101. doi: 10.1016/j.jbiotec.2008.06.003. Epub 2008 Jun 20.
2
Horizontal gene transfer in eukaryotic evolution.真核生物进化中的水平基因转移
Nat Rev Genet. 2008 Aug;9(8):605-18. doi: 10.1038/nrg2386.
3
Binning sequences using very sparse labels within a metagenome.在宏基因组内使用非常稀疏的标签对序列进行分箱。
Nat Rev Microbiol. 2024 Apr;22(4):191-205. doi: 10.1038/s41579-023-00984-1. Epub 2023 Nov 15.
4
Metagenomic approaches in microbial ecology: an update on whole-genome and marker gene sequencing analyses.微生物生态学中的宏基因组学方法:全基因组和标记基因测序分析的最新进展。
Microb Genom. 2020 Aug;6(8). doi: 10.1099/mgen.0.000409. Epub 2020 Jul 24.
5
An Integrated Multi-Disciplinary Perspectivefor Addressing Challenges of the Human Gut Microbiome.一种应对人类肠道微生物群挑战的综合多学科视角
Metabolites. 2020 Mar 6;10(3):94. doi: 10.3390/metabo10030094.
6
K-mer-Based Motif Analysis in Insect Species across , , and Genera and Its Application to Species Classification.基于 K- -mer 的昆虫种、属和科的基序分析及其在物种分类中的应用。
Comput Math Methods Med. 2019 Nov 15;2019:4259479. doi: 10.1155/2019/4259479. eCollection 2019.
7
Machine Learning Approaches for Epidemiological Investigations of Food-Borne Disease Outbreaks.用于食源性疾病暴发流行病学调查的机器学习方法
Front Microbiol. 2019 Aug 6;10:1722. doi: 10.3389/fmicb.2019.01722. eCollection 2019.
8
Elucidation of Codon Usage Signatures across the Domains of Life.阐明生命领域中的密码子使用特征。
Mol Biol Evol. 2019 Oct 1;36(10):2328-2339. doi: 10.1093/molbev/msz124.
9
Taxonomy based performance metrics for evaluating taxonomic assignment methods.基于分类的性能指标,用于评估分类分配方法。
BMC Bioinformatics. 2019 Jun 11;20(1):310. doi: 10.1186/s12859-019-2896-0.
10
Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection.考虑优化高通量测序生物信息学管道以用于病毒检测。
Viruses. 2018 Sep 27;10(10):528. doi: 10.3390/v10100528.
BMC Bioinformatics. 2008 Apr 28;9:215. doi: 10.1186/1471-2105-9-215.
4
Investigations of oligonucleotide usage variance within and between prokaryotes.原核生物内部以及不同原核生物之间寡核苷酸使用差异的研究。
PLoS Comput Biol. 2008 Apr 18;4(4):e1000057. doi: 10.1371/journal.pcbi.1000057.
5
Reliability and applications of statistical methods based on oligonucleotide frequencies in bacterial and archaeal genomes.基于寡核苷酸频率的统计方法在细菌和古细菌基因组中的可靠性及应用
BMC Genomics. 2008 Feb 28;9:104. doi: 10.1186/1471-2164-9-104.
6
Phylogenetic classification of short environmental DNA fragments.短环境DNA片段的系统发育分类
Nucleic Acids Res. 2008 Apr;36(7):2230-9. doi: 10.1093/nar/gkn038. Epub 2008 Feb 19.
7
Using machine learning algorithms to guide rehabilitation planning for home care clients.使用机器学习算法指导居家护理客户的康复计划。
BMC Med Inform Decis Mak. 2007 Dec 20;7:41. doi: 10.1186/1472-6947-7-41.
8
The Pfam protein families database.Pfam蛋白质家族数据库。
Nucleic Acids Res. 2008 Jan;36(Database issue):D281-8. doi: 10.1093/nar/gkm960. Epub 2007 Nov 26.
9
Get the most out of your metagenome: computational analysis of environmental sequence data.充分利用您的宏基因组:环境序列数据的计算分析。
Curr Opin Microbiol. 2007 Oct;10(5):490-8. doi: 10.1016/j.mib.2007.09.001. Epub 2007 Oct 23.
10
DarkHorse: a method for genome-wide prediction of horizontal gene transfer.黑马:一种全基因组水平基因转移预测方法
Genome Biol. 2007;8(2):R16. doi: 10.1186/gb-2007-8-2-r16.