Suppr超能文献

具有二级特征的蛋白质基序的贝叶斯模型和马尔可夫链蒙特卡罗方法。

Bayesian models and Markov chain Monte Carlo methods for protein motifs with the secondary characteristics.

作者信息

Xie Jun, Kim Nak-Kyeong

机构信息

Department of Statistics, Purdue University, 150 N. University Street, West Lafayette, IN 47907-2067, USA.

出版信息

J Comput Biol. 2005 Sep;12(7):952-70. doi: 10.1089/cmb.2005.12.952.

Abstract

Statistical methods have been developed for finding local patterns, also called motifs, in multiple protein sequences. The aligned segments may imply functional or structural core regions. However, the existing methods often have difficulties in aligning multiple proteins when sequence residue identities are low (e.g., less than 25%). In this article, we develop a Bayesian model and Markov chain Monte Carlo (MCMC) methods for identifying subtle motifs in protein sequences. Specifically, a motif is defined not only in terms of specific sites characterized by amino acid frequency vectors, but also as a combination of secondary characteristics such as hydrophobicity, polarity, etc. Markov chain Monte Carlo methods are proposed to search for a motif pattern with high posterior probability under the new model. A special MCMC algorithm is developed, involving transitions between state spaces of different dimensions. The proposed methods were supported by a simulated study. It was then tested by two real datasets, including a group of helix-turn-helix proteins, and one set from the CATH Protein Structure Classification Database. Statistical comparisons showed that the new approach worked better than a typical Gibbs sampling approach which is based only on an amino acid model.

摘要

统计方法已被开发用于在多个蛋白质序列中寻找局部模式,也称为基序。比对后的片段可能暗示功能或结构核心区域。然而,当序列残基一致性较低(例如,低于25%)时,现有方法在比对多个蛋白质时往往存在困难。在本文中,我们开发了一种贝叶斯模型和马尔可夫链蒙特卡罗(MCMC)方法来识别蛋白质序列中的细微基序。具体而言,一个基序不仅根据由氨基酸频率向量表征的特定位点来定义,还被定义为诸如疏水性、极性等二级特征的组合。提出了马尔可夫链蒙特卡罗方法来在新模型下搜索具有高后验概率的基序模式。开发了一种特殊的MCMC算法,涉及不同维度状态空间之间的转换。所提出的方法得到了模拟研究的支持。然后通过两个真实数据集进行了测试,包括一组螺旋-转角-螺旋蛋白,以及来自CATH蛋白质结构分类数据库的一组数据。统计比较表明,新方法比仅基于氨基酸模型的典型吉布斯采样方法效果更好。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验