Suppr超能文献

用于推断蛋白质三级结构中功能重要区域的系统发生高斯过程模型。

Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures.

作者信息

Huang Yi-Fei, Golding G Brian

机构信息

Department of Biology, McMaster University, Hamilton, Ontario, Canada.

出版信息

PLoS Comput Biol. 2014 Jan;10(1):e1003429. doi: 10.1371/journal.pcbi.1003429. Epub 2014 Jan 16.

Abstract

A critical question in biology is the identification of functionally important amino acid sites in proteins. Because functionally important sites are under stronger purifying selection, site-specific substitution rates tend to be lower than usual at these sites. A large number of phylogenetic models have been developed to estimate site-specific substitution rates in proteins and the extraordinarily low substitution rates have been used as evidence of function. Most of the existing tools, e.g. Rate4Site, assume that site-specific substitution rates are independent across sites. However, site-specific substitution rates may be strongly correlated in the protein tertiary structure, since functionally important sites tend to be clustered together to form functional patches. We have developed a new model, GP4Rate, which incorporates the Gaussian process model with the standard phylogenetic model to identify slowly evolved regions in protein tertiary structures. GP4Rate uses the Gaussian process to define a nonparametric prior distribution of site-specific substitution rates, which naturally captures the spatial correlation of substitution rates. Simulations suggest that GP4Rate can potentially estimate site-specific substitution rates with a much higher accuracy than Rate4Site and tends to report slowly evolved regions rather than individual sites. In addition, GP4Rate can estimate the strength of the spatial correlation of substitution rates from the data. By applying GP4Rate to a set of mammalian B7-1 genes, we found a highly conserved region which coincides with experimental evidence. GP4Rate may be a useful tool for the in silico prediction of functionally important regions in the proteins with known structures.

摘要

生物学中的一个关键问题是识别蛋白质中功能重要的氨基酸位点。由于功能重要的位点受到更强的纯化选择,这些位点的位点特异性替代率往往低于通常水平。已经开发了大量的系统发育模型来估计蛋白质中的位点特异性替代率,极低的替代率被用作功能的证据。大多数现有工具,例如Rate4Site,假设位点特异性替代率在各个位点之间是独立的。然而,位点特异性替代率在蛋白质三级结构中可能高度相关,因为功能重要的位点往往聚集在一起形成功能斑块。我们开发了一种新模型GP4Rate,它将高斯过程模型与标准系统发育模型相结合,以识别蛋白质三级结构中缓慢进化的区域。GP4Rate使用高斯过程来定义位点特异性替代率的非参数先验分布,这自然地捕捉了替代率的空间相关性。模拟表明,GP4Rate可能比Rate4Site更准确地估计位点特异性替代率,并且倾向于报告缓慢进化的区域而不是单个位点。此外,GP4Rate可以从数据中估计替代率的空间相关性强度。通过将GP4Rate应用于一组哺乳动物B7-1基因,我们发现了一个与实验证据相符的高度保守区域。GP4Rate可能是一种用于在计算机上预测已知结构蛋白质中功能重要区域的有用工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/53e4/3894161/516fdcedecfe/pcbi.1003429.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验