Suppr超能文献

条件神经场模型在蛋白质穿线中的应用。

A conditional neural fields model for protein threading.

机构信息

Toyota Technological Institute at Chicago, IL 60637, USA.

出版信息

Bioinformatics. 2012 Jun 15;28(12):i59-66. doi: 10.1093/bioinformatics/bts213.

Abstract

MOTIVATION

Alignment errors are still the main bottleneck for current template-based protein modeling (TM) methods, including protein threading and homology modeling, especially when the sequence identity between two proteins under consideration is low (<30%).

RESULTS

We present a novel protein threading method, CNFpred, which achieves much more accurate sequence-template alignment by employing a probabilistic graphical model called a Conditional Neural Field (CNF), which aligns one protein sequence to its remote template using a non-linear scoring function. This scoring function accounts for correlation among a variety of protein sequence and structure features, makes use of information in the neighborhood of two residues to be aligned, and is thus much more sensitive than the widely used linear or profile-based scoring function. To train this CNF threading model, we employ a novel quality-sensitive method, instead of the standard maximum-likelihood method, to maximize directly the expected quality of the training set. Experimental results show that CNFpred generates significantly better alignments than the best profile-based and threading methods on several public (but small) benchmarks as well as our own large dataset. CNFpred outperforms others regardless of the lengths or classes of proteins, and works particularly well for proteins with sparse sequence profiles due to the effective utilization of structure information. Our methodology can also be adapted to protein sequence alignment.

摘要

动机

对齐误差仍然是当前基于模板的蛋白质建模(TM)方法的主要瓶颈,包括蛋白质序列比对和同源建模,特别是当所考虑的两个蛋白质之间的序列同一性较低(<30%)时。

结果

我们提出了一种新的蛋白质序列比对方法 CNFpred,它通过使用称为条件神经网络场(CNF)的概率图形模型实现了更准确的序列-模板对齐,该模型使用非线性评分函数将一个蛋白质序列与其远程模板对齐。该评分函数考虑了各种蛋白质序列和结构特征之间的相关性,利用要对齐的两个残基的邻域中的信息,因此比广泛使用的线性或基于轮廓的评分函数更敏感。为了训练这个 CNF 序列比对模型,我们采用了一种新颖的、基于质量的方法,而不是标准的最大似然方法,直接最大化训练集的预期质量。实验结果表明,CNFpred 在几个公共(但较小)基准数据集以及我们自己的大型数据集上生成的比对结果明显优于最好的基于轮廓和序列比对方法。无论蛋白质的长度或类别如何,CNFpred 都表现出色,并且由于有效利用结构信息,对于序列轮廓稀疏的蛋白质尤其有效。我们的方法也可以适应蛋白质序列比对。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0172/3371845/60bd8a9542cb/bts213f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验