• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HMM-FRAME:用于分类含有移码错误的宏基因组序列的蛋白质结构域。

HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors.

机构信息

Computer Science and Engineering Department, Michigan State University, East Lansing, USA.

出版信息

BMC Bioinformatics. 2011 May 24;12:198. doi: 10.1186/1471-2105-12-198.

DOI:10.1186/1471-2105-12-198
PMID:21609463
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3115854/
Abstract

BACKGROUND

Protein domain classification is an important step in metagenomic annotation. The state-of-the-art method for protein domain classification is profile HMM-based alignment. However, the relatively high rates of insertions and deletions in homopolymer regions of pyrosequencing reads create frameshifts, causing conventional profile HMM alignment tools to generate alignments with marginal scores. This makes error-containing gene fragments unclassifiable with conventional tools. Thus, there is a need for an accurate domain classification tool that can detect and correct sequencing errors.

RESULTS

We introduce HMM-FRAME, a protein domain classification tool based on an augmented Viterbi algorithm that can incorporate error models from different sequencing platforms. HMM-FRAME corrects sequencing errors and classifies putative gene fragments into domain families. It achieved high error detection sensitivity and specificity in a data set with annotated errors. We applied HMM-FRAME in Targeted Metagenomics and a published metagenomic data set. The results showed that our tool can correct frameshifts in error-containing sequences, generate much longer alignments with significantly smaller E-values, and classify more sequences into their native families.

CONCLUSIONS

HMM-FRAME provides a complementary protein domain classification tool to conventional profile HMM-based methods for data sets containing frameshifts. Its current implementation is best used for small-scale metagenomic data sets. The source code of HMM-FRAME can be downloaded at http://www.cse.msu.edu/~zhangy72/hmmframe/ and at https://sourceforge.net/projects/hmm-frame/.

摘要

背景

蛋白质结构域分类是宏基因组注释的重要步骤。基于轮廓隐马尔可夫模型(profile HMM)比对的方法是蛋白质结构域分类的最新技术。然而,焦磷酸测序reads 中的同源聚合物区域的插入和缺失率相对较高,导致传统的基于轮廓 HMM 的比对工具生成得分较低的比对结果。这使得含有错误的基因片段无法使用传统工具进行分类。因此,需要一种能够检测和纠正测序错误的准确结构域分类工具。

结果

我们引入了 HMM-FRAME,这是一种基于增强维特比算法的蛋白质结构域分类工具,能够整合来自不同测序平台的错误模型。HMM-FRAME 能够纠正测序错误并将假定的基因片段分类到结构域家族中。在带有注释错误的数据集中,它具有较高的错误检测灵敏度和特异性。我们将 HMM-FRAME 应用于靶向宏基因组学和已发表的宏基因组数据集。结果表明,我们的工具能够纠正含有错误的序列中的移码,生成具有显著更小 E 值的更长比对结果,并将更多的序列分类到其天然家族中。

结论

HMM-FRAME 为含有移码的数据集提供了一种与传统基于轮廓 HMM 的方法互补的蛋白质结构域分类工具。它的当前实现最适合于小规模的宏基因组数据集。HMM-FRAME 的源代码可以在以下网址下载:http://www.cse.msu.edu/~zhangy72/hmmframe/ 和 https://sourceforge.net/projects/hmm-frame/。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b4b/3115854/299b3e700eea/1471-2105-12-198-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b4b/3115854/8f0475b85a71/1471-2105-12-198-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b4b/3115854/c4df976f9788/1471-2105-12-198-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b4b/3115854/0c5e925efbeb/1471-2105-12-198-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b4b/3115854/299b3e700eea/1471-2105-12-198-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b4b/3115854/8f0475b85a71/1471-2105-12-198-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b4b/3115854/c4df976f9788/1471-2105-12-198-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b4b/3115854/0c5e925efbeb/1471-2105-12-198-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b4b/3115854/299b3e700eea/1471-2105-12-198-4.jpg

相似文献

1
HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors.HMM-FRAME:用于分类含有移码错误的宏基因组序列的蛋白质结构域。
BMC Bioinformatics. 2011 May 24;12:198. doi: 10.1186/1471-2105-12-198.
2
MetaDomain: a profile HMM-based protein domain classification tool for short sequences.MetaDomain:一种基于隐马尔可夫模型轮廓的短序列蛋白质结构域分类工具。
Pac Symp Biocomput. 2012:271-82.
3
Improve homology search sensitivity of PacBio data by correcting frameshifts.通过校正移码来提高PacBio数据的同源性搜索灵敏度。
Bioinformatics. 2016 Sep 1;32(17):i529-i537. doi: 10.1093/bioinformatics/btw458.
4
Metagenome and Metatranscriptome Analyses Using Protein Family Profiles.利用蛋白质家族谱进行宏基因组和宏转录组分析。
PLoS Comput Biol. 2016 Jul 11;12(7):e1004991. doi: 10.1371/journal.pcbi.1004991. eCollection 2016 Jul.
5
Short-read reading-frame predictors are not created equal: sequence error causes loss of signal.短读阅读框预测器并不相同:序列错误导致信号丢失。
BMC Bioinformatics. 2012 Jul 28;13:183. doi: 10.1186/1471-2105-13-183.
6
HMM-ModE--improved classification using profile hidden Markov models by optimising the discrimination threshold and modifying emission probabilities with negative training sequences.HMM-ModE——通过优化判别阈值并利用负训练序列修改发射概率,使用轮廓隐马尔可夫模型改进分类。
BMC Bioinformatics. 2007 Mar 27;8:104. doi: 10.1186/1471-2105-8-104.
7
A sensitive short read homology search tool for paired-end read sequencing data.一种用于双端读段测序数据的灵敏短读段同源性搜索工具。
BMC Bioinformatics. 2017 Oct 16;18(Suppl 12):414. doi: 10.1186/s12859-017-1826-2.
8
The effect of sequencing errors on metagenomic gene prediction.测序错误对宏基因组基因预测的影响。
BMC Genomics. 2009 Nov 12;10:520. doi: 10.1186/1471-2164-10-520.
9
AlignHUSH: alignment of HMMs using structure and hydrophobicity information.AlignHUSH:使用结构和疏水性信息对齐隐马尔可夫模型。
BMC Bioinformatics. 2011 Jul 5;12:275. doi: 10.1186/1471-2105-12-275.
10
Kraken: ultrafast metagenomic sequence classification using exact alignments.克拉肯:使用精确比对的超快速宏基因组序列分类
Genome Biol. 2014 Mar 3;15(3):R46. doi: 10.1186/gb-2014-15-3-r46.

引用本文的文献

1
Sensitive and error-tolerant annotation of protein-coding DNA with BATH.利用BATH对蛋白质编码DNA进行灵敏且容错的注释。
Bioinform Adv. 2024 Jun 14;4(1):vbae088. doi: 10.1093/bioadv/vbae088. eCollection 2024.
2
Sensitive and error-tolerant annotation of protein-coding DNA with BATH.使用BATH对蛋白质编码DNA进行灵敏且容错的注释。
bioRxiv. 2024 Jan 1:2023.12.31.573773. doi: 10.1101/2023.12.31.573773.
3
Assembly of root-associated N2O-reducing communities of annual crops is governed by selection for nosZ clade I over clade II.

本文引用的文献

1
FunGene: the functional gene pipeline and repository.FunGene:功能基因管道和存储库。
Front Microbiol. 2013 Oct 1;4:291. doi: 10.3389/fmicb.2013.00291. eCollection 2013.
2
FragGeneScan: predicting genes in short and error-prone reads.FragGeneScan:预测短读和易错读中的基因。
Nucleic Acids Res. 2010 Nov;38(20):e191. doi: 10.1093/nar/gkq747. Epub 2010 Aug 30.
3
Genetack: frameshift identification in protein-coding sequences by the Viterbi algorithm.Genetack:通过维特比算法识别蛋白质编码序列中的移码突变。
根际 N2O 还原群落的组装受 NosZ Ⅰ型 clade 对 Ⅱ型 clade 的选择控制。
FEMS Microbiol Ecol. 2022 Aug 23;98(9). doi: 10.1093/femsec/fiac092.
4
PhyloFunDB: A Pipeline to Create and Update Functional Gene Taxonomic Databases.系统发育功能数据库(PhyloFunDB):创建和更新功能基因分类数据库的流程
Microorganisms. 2022 May 25;10(6):1093. doi: 10.3390/microorganisms10061093.
5
Nematode Predation and Competitive Interactions Affect Microbe-Mediated Phosphorus Dynamics.线虫捕食和竞争相互作用影响微生物介导的磷动态。
mBio. 2022 Jun 28;13(3):e0329321. doi: 10.1128/mbio.03293-21. Epub 2022 Apr 14.
6
Organism body size structures the soil microbial and nematode community assembly at a continental and global scale.生物体大小结构在大陆和全球范围内构建了土壤微生物和线虫群落组装。
Nat Commun. 2020 Dec 17;11(1):6406. doi: 10.1038/s41467-020-20271-4.
7
Habitat diversity and type govern potential nitrogen loss by denitrification in coastal sediments and differences in ecosystem-level diversities of disparate N2O reducing communities.生境多样性和类型控制着沿海沉积物中反硝化作用的潜在氮损失,以及不同 N2O 还原群落的生态系统水平多样性的差异。
FEMS Microbiol Ecol. 2020 Sep 1;96(9). doi: 10.1093/femsec/fiaa091.
8
Coupling Bacterial Community Assembly to Microbial Metabolism across Soil Profiles.跨土壤剖面将细菌群落组装与微生物代谢相耦合
mSystems. 2020 Jun 9;5(3):e00298-20. doi: 10.1128/mSystems.00298-20.
9
Competitive interaction with keystone taxa induced negative priming under biochar amendments.生物炭添加下,与关键种的竞争相互作用诱导了负启动。
Microbiome. 2019 May 20;7(1):77. doi: 10.1186/s40168-019-0693-7.
10
Search for potential reading frameshifts in cds from Arabidopsis thaliana and other genomes.搜索拟南芥和其他基因组中的 cds 潜在移码突变。
DNA Res. 2019 Apr 1;26(2):157-170. doi: 10.1093/dnares/dsy046.
J Bioinform Comput Biol. 2010 Jun;8(3):535-51. doi: 10.1142/s0219720010004847.
4
Back-translation for discovering distant protein homologies in the presence of frameshift mutations.用于在存在移码突变的情况下发现远距离蛋白质同源性的反向翻译。
Algorithms Mol Biol. 2010 Jan 4;5(1):6. doi: 10.1186/1748-7188-5-6.
5
Analysis and comparison of very large metagenomes with fast clustering and functional annotation.快速聚类和功能注释的超大宏基因组分析与比较。
BMC Bioinformatics. 2009 Oct 28;10:359. doi: 10.1186/1471-2105-10-359.
6
Gene-targeted-metagenomics reveals extensive diversity of aromatic dioxygenase genes in the environment.基因靶向宏基因组学揭示了环境中芳香双加氧酶基因的广泛多样性。
ISME J. 2010 Feb;4(2):279-85. doi: 10.1038/ismej.2009.104. Epub 2009 Sep 24.
7
Accurate determination of microbial diversity from 454 pyrosequencing data.从454焦磷酸测序数据中准确测定微生物多样性。
Nat Methods. 2009 Sep;6(9):639-41. doi: 10.1038/nmeth.1361. Epub 2009 Aug 9.
8
Frameshift detection in prokaryotic genomic sequences.原核生物基因组序列中的移码检测
Int J Bioinform Res Appl. 2009;5(4):458-77. doi: 10.1504/IJBRA.2009.027519.
9
Viral population estimation using pyrosequencing.使用焦磷酸测序法进行病毒群体估计。
PLoS Comput Biol. 2008 May 9;4(4):e1000074. doi: 10.1371/journal.pcbi.1000074.
10
Clustal W and Clustal X version 2.0.Clustal W和Clustal X 2.0版本
Bioinformatics. 2007 Nov 1;23(21):2947-8. doi: 10.1093/bioinformatics/btm404. Epub 2007 Sep 10.