Suppr超能文献

HMM-FRAME:用于分类含有移码错误的宏基因组序列的蛋白质结构域。

HMM-FRAME: accurate protein domain classification for metagenomic sequences containing frameshift errors.

机构信息

Computer Science and Engineering Department, Michigan State University, East Lansing, USA.

出版信息

BMC Bioinformatics. 2011 May 24;12:198. doi: 10.1186/1471-2105-12-198.

Abstract

BACKGROUND

Protein domain classification is an important step in metagenomic annotation. The state-of-the-art method for protein domain classification is profile HMM-based alignment. However, the relatively high rates of insertions and deletions in homopolymer regions of pyrosequencing reads create frameshifts, causing conventional profile HMM alignment tools to generate alignments with marginal scores. This makes error-containing gene fragments unclassifiable with conventional tools. Thus, there is a need for an accurate domain classification tool that can detect and correct sequencing errors.

RESULTS

We introduce HMM-FRAME, a protein domain classification tool based on an augmented Viterbi algorithm that can incorporate error models from different sequencing platforms. HMM-FRAME corrects sequencing errors and classifies putative gene fragments into domain families. It achieved high error detection sensitivity and specificity in a data set with annotated errors. We applied HMM-FRAME in Targeted Metagenomics and a published metagenomic data set. The results showed that our tool can correct frameshifts in error-containing sequences, generate much longer alignments with significantly smaller E-values, and classify more sequences into their native families.

CONCLUSIONS

HMM-FRAME provides a complementary protein domain classification tool to conventional profile HMM-based methods for data sets containing frameshifts. Its current implementation is best used for small-scale metagenomic data sets. The source code of HMM-FRAME can be downloaded at http://www.cse.msu.edu/~zhangy72/hmmframe/ and at https://sourceforge.net/projects/hmm-frame/.

摘要

背景

蛋白质结构域分类是宏基因组注释的重要步骤。基于轮廓隐马尔可夫模型(profile HMM)比对的方法是蛋白质结构域分类的最新技术。然而,焦磷酸测序reads 中的同源聚合物区域的插入和缺失率相对较高,导致传统的基于轮廓 HMM 的比对工具生成得分较低的比对结果。这使得含有错误的基因片段无法使用传统工具进行分类。因此,需要一种能够检测和纠正测序错误的准确结构域分类工具。

结果

我们引入了 HMM-FRAME,这是一种基于增强维特比算法的蛋白质结构域分类工具,能够整合来自不同测序平台的错误模型。HMM-FRAME 能够纠正测序错误并将假定的基因片段分类到结构域家族中。在带有注释错误的数据集中,它具有较高的错误检测灵敏度和特异性。我们将 HMM-FRAME 应用于靶向宏基因组学和已发表的宏基因组数据集。结果表明,我们的工具能够纠正含有错误的序列中的移码,生成具有显著更小 E 值的更长比对结果,并将更多的序列分类到其天然家族中。

结论

HMM-FRAME 为含有移码的数据集提供了一种与传统基于轮廓 HMM 的方法互补的蛋白质结构域分类工具。它的当前实现最适合于小规模的宏基因组数据集。HMM-FRAME 的源代码可以在以下网址下载:http://www.cse.msu.edu/~zhangy72/hmmframe/https://sourceforge.net/projects/hmm-frame/。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2b4b/3115854/8f0475b85a71/1471-2105-12-198-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验