• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MetaDomain:一种基于隐马尔可夫模型轮廓的短序列蛋白质结构域分类工具。

MetaDomain: a profile HMM-based protein domain classification tool for short sequences.

作者信息

Zhang Yuan, Sun Yanni

机构信息

Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.

出版信息

Pac Symp Biocomput. 2012:271-82.

PMID:22174282
Abstract

Protein homology search provides basis for functional profiling in metagenomic annotation. Profile HMM-based methods classify reads into annotated protein domain families and can achieve better sensitivity for remote protein homology search than pairwise sequence alignment. However, their sensitivity deteriorates with the decrease of read length. As a result, a large number of short reads cannot be classified into their native domain families. In this work, we introduce MetaDomain, a protein domain classification tool designed for short reads generated by next-generation sequencing technologies. MetaDomain uses relaxed position-specific score thresholds to align more reads to a profile HMM while using the distribution of alignment positions as an additional constraint to control false positive matches. In this work MetaDomain is applied to the transcriptomic data of a bacterial genome and a soil metagenomic data set. The experimental results show that it can achieve better sensitivity than the state-of-the-art profile HMM alignment tool in identifying encoded domains from short sequences. The source codes of MetaDomain are available at http://sourceforge.net/projects/metadomain/.

摘要

蛋白质同源性搜索为宏基因组注释中的功能分析提供了基础。基于隐马尔可夫模型(Profile HMM)的方法将读段分类到已注释的蛋白质结构域家族中,并且与两两序列比对相比,在远程蛋白质同源性搜索中能够实现更高的灵敏度。然而,随着读段长度的减少,其灵敏度会下降。因此,大量短读段无法被分类到其原生结构域家族中。在这项工作中,我们引入了MetaDomain,这是一种针对下一代测序技术产生的短读段设计的蛋白质结构域分类工具。MetaDomain使用宽松的位置特异性得分阈值,将更多读段比对到一个Profile HMM上,同时将比对位置的分布作为额外的约束条件来控制假阳性匹配。在这项工作中,MetaDomain被应用于一个细菌基因组的转录组数据和一个土壤宏基因组数据集。实验结果表明,在从短序列中识别编码结构域方面,它能够比最先进的Profile HMM比对工具实现更高的灵敏度。MetaDomain的源代码可在http://sourceforge.net/projects/metadomain/获取。

相似文献

1
MetaDomain: a profile HMM-based protein domain classification tool for short sequences.MetaDomain:一种基于隐马尔可夫模型轮廓的短序列蛋白质结构域分类工具。
Pac Symp Biocomput. 2012:271-82.
2
A sensitive short read homology search tool for paired-end read sequencing data.一种用于双端读段测序数据的灵敏短读段同源性搜索工具。
BMC Bioinformatics. 2017 Oct 16;18(Suppl 12):414. doi: 10.1186/s12859-017-1826-2.
3
HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors.HMM-FRAME:用于分类含有移码错误的宏基因组序列的蛋白质结构域。
BMC Bioinformatics. 2011 May 24;12:198. doi: 10.1186/1471-2105-12-198.
4
HMM-ModE--improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences.HMM-ModE——通过优化判别阈值并利用负训练序列修改发射概率,使用轮廓隐马尔可夫模型改进分类。
BMC Bioinformatics. 2007 Mar 27;8:104. doi: 10.1186/1471-2105-8-104.
5
Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。
Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.
6
LSHPlace: fast phylogenetic placement using locality-sensitive hashing.LSHPlace:使用局部敏感哈希进行快速系统发育定位
Pac Symp Biocomput. 2013:310-9.
7
Designing patterns for profile HMM search.设计用于隐马尔可夫模型轮廓搜索的模式。
Bioinformatics. 2007 Jan 15;23(2):e36-43. doi: 10.1093/bioinformatics/btl323.
8
Artificial functional difference between microbial communities caused by length difference of sequencing reads.测序读长差异导致微生物群落间的人工功能差异。
Pac Symp Biocomput. 2012:259-70.
9
AutoSCOP: automated prediction of SCOP classifications using unique pattern-class mappings.AutoSCOP:使用独特的模式-类别映射自动预测SCOP分类
Bioinformatics. 2007 May 15;23(10):1203-10. doi: 10.1093/bioinformatics/btm089. Epub 2007 Mar 22.
10
GATA: a graphic alignment tool for comparative sequence analysis.GATA:一种用于比较序列分析的图形比对工具。
BMC Bioinformatics. 2005 Jan 17;6:9. doi: 10.1186/1471-2105-6-9.

引用本文的文献

1
Genome-Wide Identification and Functional Characterization of Gene Family Reveal Its Involvement in Response to Stress in Cotton.基因家族的全基因组鉴定与功能表征揭示其参与棉花对胁迫的响应
Int J Mol Sci. 2025 Jan 6;26(1):418. doi: 10.3390/ijms26010418.
2
Classification of metagenomics data at lower taxonomic level using a robust supervised classifier.使用稳健的监督分类器在较低分类水平上对宏基因组学数据进行分类。
Evol Bioinform Online. 2015 Jan 26;11:3-10. doi: 10.4137/EBO.S20523. eCollection 2015.
3
A scalable and accurate targeted gene assembly tool (SAT-Assembler) for next-generation sequencing data.
一种用于下一代测序数据的可扩展且准确的靶向基因组装工具(SAT组装器)。
PLoS Comput Biol. 2014 Aug 14;10(8):e1003737. doi: 10.1371/journal.pcbi.1003737. eCollection 2014 Aug.
4
Detecting nitrous oxide reductase (NosZ) genes in soil metagenomes: method development and implications for the nitrogen cycle.检测土壤宏基因组中的一氧化二氮还原酶(NosZ)基因:方法开发及其对氮循环的影响
mBio. 2014 Jun 3;5(3):e01193-14. doi: 10.1128/mBio.01193-14.
5
Rapid identification of high-confidence taxonomic assignments for metagenomic data.快速鉴定宏基因组数据的高可信度分类学分配。
Nucleic Acids Res. 2012 Aug;40(14):e111. doi: 10.1093/nar/gks335. Epub 2012 Apr 24.