• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用IDTAXA对蛋白质编码序列进行准确注释。

Accurate annotation of protein coding sequences with IDTAXA.

作者信息

Cooley Nicholas P, Wright Erik S

机构信息

Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206, USA.

出版信息

NAR Genom Bioinform. 2021 Sep 16;3(3):lqab080. doi: 10.1093/nargab/lqab080. eCollection 2021 Sep.

DOI:10.1093/nargab/lqab080
PMID:34541527
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8445202/
Abstract

The observed diversity of protein coding sequences continues to increase far more rapidly than knowledge of their functions, making classification algorithms essential for assigning a function to proteins using only their sequence. Most pipelines for annotating proteins rely on searches for homologous sequences in databases of previously annotated proteins using BLAST or HMMER. Here, we develop a new approach for classifying proteins into a taxonomy of functions and demonstrate its utility for genome annotation. Our algorithm, IDTAXA, was more accurate than BLAST or HMMER at assigning sequences to KEGG ortholog groups. Moreover, IDTAXA correctly avoided classifying sequences with novel functions to existing groups, which is a common error mode for classification approaches that rely on E-values as a proxy for confidence. We demonstrate IDTAXA's utility for annotating eukaryotic and prokaryotic genomes by assigning functions to proteins within a multi-level ontology and applied IDTAXA to detect genome contamination in eukaryotic genomes. Finally, we re-annotated 8604 microbial genomes with known antibiotic resistance phenotypes to discover two novel associations between proteins and antibiotic resistance. IDTAXA is available as a web tool (http://DECIPHER.codes/Classification.html) or as part of the open source DECIPHER R package from Bioconductor.

摘要

已观察到的蛋白质编码序列的多样性增长速度,持续远远快于我们对其功能的了解,这使得分类算法对于仅依据蛋白质序列来赋予其功能至关重要。大多数蛋白质注释流程依赖于使用BLAST或HMMER在先前已注释蛋白质的数据库中搜索同源序列。在此,我们开发了一种将蛋白质分类到功能分类体系中的新方法,并展示了其在基因组注释中的效用。我们的算法IDTAXA在将序列分配到KEGG直系同源组方面比BLAST或HMMER更准确。此外,IDTAXA正确地避免了将具有新功能的序列分类到现有组中,而这是依赖E值作为置信度代理的分类方法常见的错误模式。我们通过在多层次本体中为蛋白质赋予功能,展示了IDTAXA在注释真核生物和原核生物基因组方面的效用,并应用IDTAXA检测真核生物基因组中的基因组污染。最后,我们对8604个具有已知抗生素抗性表型的微生物基因组进行了重新注释,以发现蛋白质与抗生素抗性之间的两个新关联。IDTAXA可作为网络工具(http://DECIPHER.codes/Classification.html)获取,也可作为来自Bioconductor的开源DECIPHER R包的一部分。

相似文献

1
Accurate annotation of protein coding sequences with IDTAXA.使用IDTAXA对蛋白质编码序列进行准确注释。
NAR Genom Bioinform. 2021 Sep 16;3(3):lqab080. doi: 10.1093/nargab/lqab080. eCollection 2021 Sep.
2
IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences.IDTAXA:一种用于微生物组序列准确分类的新方法。
Microbiome. 2018 Aug 9;6(1):140. doi: 10.1186/s40168-018-0521-5.
3
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
4
5
[Comprehensive re-annotation of protein-coding genes for prokaryotic genomes by Z-curve and similarity-based methods].[基于Z曲线和相似性方法对原核生物基因组蛋白质编码基因进行全面重新注释]
Yi Chuan. 2020 Jul 20;42(7):691-702. doi: 10.16288/j.yczz.20-022.
6
Transposable element annotation of the rice genome.水稻基因组的转座元件注释
Bioinformatics. 2004 Jan 22;20(2):155-60. doi: 10.1093/bioinformatics/bth019.
7
Argot2: a large scale function prediction tool relying on semantic similarity of weighted Gene Ontology terms.Argot2:一个大规模的功能预测工具,依赖于加权基因本体术语的语义相似性。
BMC Bioinformatics. 2012 Mar 28;13 Suppl 4(Suppl 4):S14. doi: 10.1186/1471-2105-13-S4-S14.
8
CGKB: an annotation knowledge base for cowpea (Vigna unguiculata L.) methylation filtered genomic genespace sequences.CGKB:豇豆(Vigna unguiculata L.)甲基化过滤基因组基因空间序列的注释知识库。
BMC Bioinformatics. 2007 Apr 19;8:129. doi: 10.1186/1471-2105-8-129.
9
Comparison of RefSeq protein-coding regions in human and vertebrate genomes.比较人类和脊椎动物基因组中的 RefSeq 编码蛋白区域。
BMC Genomics. 2013 Sep 25;14:654. doi: 10.1186/1471-2164-14-654.
10
Annotating microbial functions with ProkFunFind.用 ProkFunFind 注释微生物功能。
mSystems. 2024 Mar 19;9(3):e0003624. doi: 10.1128/msystems.00036-24. Epub 2024 Feb 16.

引用本文的文献

1
In Vitro Investigation of the Effects of -810B and -809A on the Rumen Fermentation and Microbiota.-810B和-809A对瘤胃发酵及微生物群影响的体外研究
Animals (Basel). 2025 Feb 7;15(4):476. doi: 10.3390/ani15040476.
2
Accurately clustering biological sequences in linear time by relatedness sorting.通过相关排序在线性时间内准确地对生物序列进行聚类。
Nat Commun. 2024 Apr 8;15(1):3047. doi: 10.1038/s41467-024-47371-9.

本文引用的文献

1
eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale.eggNOG-mapper v2:宏基因组尺度的功能注释、直系同源物分配和结构域预测。
Mol Biol Evol. 2021 Dec 9;38(12):5825-5829. doi: 10.1093/molbev/msab293.
2
Genome annotation of disease-causing microorganisms.疾病微生物的基因组注释。
Brief Bioinform. 2021 Mar 22;22(2):845-854. doi: 10.1093/bib/bbab004.
3
MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes.微生物注释器:一个用户友好、全面的微生物基因组功能注释管道。
BMC Bioinformatics. 2021 Jan 6;22(1):11. doi: 10.1186/s12859-020-03940-5.
4
Estimate of the sequenced proportion of the global prokaryotic genome.全球原核生物基因组测序比例的估计。
Microbiome. 2020 Sep 16;8(1):134. doi: 10.1186/s40168-020-00903-z.
5
Detecting Gene Ontology misannotations using taxon-specific rate ratio comparisons.利用分类群特异性比率比较检测基因本体论错误注释。
Bioinformatics. 2020 Aug 15;36(16):4383-4388. doi: 10.1093/bioinformatics/btaa548.
6
Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank.终止污染:大规模搜索在 GenBank 中发现超过 200 万条污染条目。
Genome Biol. 2020 May 12;21(1):115. doi: 10.1186/s13059-020-02023-1.
7
An assessment of genome annotation coverage across the bacterial tree of life.评估细菌生命之树的基因组注释覆盖率。
Microb Genom. 2020 Mar;6(3). doi: 10.1099/mgen.0.000341.
8
Next-generation genome annotation: we still struggle to get it right.下一代基因组注释:我们仍在努力做到正确。
Genome Biol. 2019 May 16;20(1):92. doi: 10.1186/s13059-019-1715-2.
9
New approach for understanding genome variations in KEGG.KEGG 中基因组变异的新方法。
Nucleic Acids Res. 2019 Jan 8;47(D1):D590-D595. doi: 10.1093/nar/gky962.
10
IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences.IDTAXA:一种用于微生物组序列准确分类的新方法。
Microbiome. 2018 Aug 9;6(1):140. doi: 10.1186/s40168-018-0521-5.