Suppr超能文献

概率后缀树的变体:蛋白质家族的统计建模与预测

Variations on probabilistic suffix trees: statistical modeling and prediction of protein families.

作者信息

Bejerano G, Yona G

机构信息

School of Computer Science and Engineering, Hebrew University, Jerusalem 91904, Israel.

出版信息

Bioinformatics. 2001 Jan;17(1):23-43. doi: 10.1093/bioinformatics/17.1.23.

Abstract

MOTIVATION

We present a method for modeling protein families by means of probabilistic suffix trees (PSTs). The method is based on identifying significant patterns in a set of related protein sequences. The patterns can be of arbitrary length, and the input sequences do not need to be aligned, nor is delineation of domain boundaries required. The method is automatic, and can be applied, without assuming any preliminary biological information, with surprising success. Basic biological considerations such as amino acid background probabilities, and amino acids substitution probabilities can be incorporated to improve performance.

RESULTS

The PST can serve as a predictive tool for protein sequence classification, and for detecting conserved patterns (possibly functionally or structurally important) within protein sequences. The method was tested on the Pfam database of protein families with more than satisfactory performance. Exhaustive evaluations show that the PST model detects much more related sequences than pairwise methods such as Gapped-BLAST, and is almost as sensitive as a hidden Markov model that is trained from a multiple alignment of the input sequences, while being much faster.

摘要

动机

我们提出了一种通过概率后缀树(PST)对蛋白质家族进行建模的方法。该方法基于识别一组相关蛋白质序列中的显著模式。这些模式可以具有任意长度,并且输入序列无需比对,也不需要划分结构域边界。该方法是自动的,无需假设任何初步的生物学信息即可应用,且取得了惊人的成功。诸如氨基酸背景概率和氨基酸替换概率等基本生物学因素可被纳入以提高性能。

结果

概率后缀树可作为蛋白质序列分类以及检测蛋白质序列中保守模式(可能在功能或结构上很重要)的预测工具。该方法在蛋白质家族的Pfam数据库上进行了测试,性能令人满意。详尽的评估表明,概率后缀树模型比诸如空位BLAST等成对方法检测到的相关序列要多得多,并且几乎与从输入序列的多序列比对训练得到的隐马尔可夫模型一样灵敏,同时速度要快得多。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验