MetaDomain：一种基于隐马尔可夫模型轮廓的短序列蛋白质结构域分类工具。

MetaDomain: a profile HMM-based protein domain classification tool for short sequences.

作者信息

Zhang Yuan, Sun Yanni

机构信息

Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA.

出版信息

Pac Symp Biocomput. 2012:271-82.

PMID:22174282

Abstract

Protein homology search provides basis for functional profiling in metagenomic annotation. Profile HMM-based methods classify reads into annotated protein domain families and can achieve better sensitivity for remote protein homology search than pairwise sequence alignment. However, their sensitivity deteriorates with the decrease of read length. As a result, a large number of short reads cannot be classified into their native domain families. In this work, we introduce MetaDomain, a protein domain classification tool designed for short reads generated by next-generation sequencing technologies. MetaDomain uses relaxed position-specific score thresholds to align more reads to a profile HMM while using the distribution of alignment positions as an additional constraint to control false positive matches. In this work MetaDomain is applied to the transcriptomic data of a bacterial genome and a soil metagenomic data set. The experimental results show that it can achieve better sensitivity than the state-of-the-art profile HMM alignment tool in identifying encoded domains from short sequences. The source codes of MetaDomain are available at http://sourceforge.net/projects/metadomain/.

摘要

蛋白质同源性搜索为宏基因组注释中的功能分析提供了基础。基于隐马尔可夫模型（Profile HMM）的方法将读段分类到已注释的蛋白质结构域家族中，并且与两两序列比对相比，在远程蛋白质同源性搜索中能够实现更高的灵敏度。然而，随着读段长度的减少，其灵敏度会下降。因此，大量短读段无法被分类到其原生结构域家族中。在这项工作中，我们引入了MetaDomain，这是一种针对下一代测序技术产生的短读段设计的蛋白质结构域分类工具。MetaDomain使用宽松的位置特异性得分阈值，将更多读段比对到一个Profile HMM上，同时将比对位置的分布作为额外的约束条件来控制假阳性匹配。在这项工作中，MetaDomain被应用于一个细菌基因组的转录组数据和一个土壤宏基因组数据集。实验结果表明，在从短序列中识别编码结构域方面，它能够比最先进的Profile HMM比对工具实现更高的灵敏度。MetaDomain的源代码可在http://sourceforge.net/projects/metadomain/获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

MetaDomain：一种基于隐马尔可夫模型轮廓的短序列蛋白质结构域分类工具。

MetaDomain: a profile HMM-based protein domain classification tool for short sequences.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

MetaDomain：一种基于隐马尔可夫模型轮廓的短序列蛋白质结构域分类工具。

MetaDomain: a profile HMM-based protein domain classification tool for short sequences.

作者信息

机构信息

出版信息

相似文献

引用本文的文献