Suppr超能文献

一种用于局部蛋白质结构的结构字母表:改进的预测方法。

A structural alphabet for local protein structures: improved prediction methods.

作者信息

Etchebest Catherine, Benros Cristina, Hazout Serge, de Brevern Alexandre G

机构信息

Equipe de Bioinformatique Génomique et Moléculaire, INSERM U726, Université Denis DIDEROT-Paris, France.

出版信息

Proteins. 2005 Jun 1;59(4):810-27. doi: 10.1002/prot.20458.

Abstract

Three-dimensional protein structures can be described with a library of 3D fragments that define a structural alphabet. We have previously proposed such an alphabet, composed of 16 patterns of five consecutive amino acids, called Protein Blocks (PBs). These PBs have been used to describe protein backbones and to predict local structures from protein sequences. The Q16 prediction rate reaches 40.7% with an optimization procedure. This article examines two aspects of PBs. First, we determine the effect of the enlargement of databanks on their definition. The results show that the geometrical features of the different PBs are preserved (local RMSD value equal to 0.41 A on average) and sequence-structure specificities reinforced when databanks are enlarged. Second, we improve the methods for optimizing PB predictions from sequences, revisiting the optimization procedure and exploring different local prediction strategies. Use of a statistical optimization procedure for the sequence-local structure relation improves prediction accuracy by 8% (Q16 = 48.7%). Better recognition of repetitive structures occurs without losing the prediction efficiency of the other local folds. Adding secondary structure prediction improved the accuracy of Q16 by only 1%. An entropy index (Neq), strongly related to the RMSD value of the difference between predicted PBs and true local structures, is proposed to estimate prediction quality. The Neq is linearly correlated with the Q16 prediction rate distributions, computed for a large set of proteins. An "expected" prediction rate QE16 is deduced with a mean error of 5%.

摘要

三维蛋白质结构可以用一个定义结构字母表的3D片段库来描述。我们之前已经提出了这样一种字母表,它由16种由五个连续氨基酸组成的模式构成,称为蛋白质模块(PBs)。这些PBs已被用于描述蛋白质主链,并从蛋白质序列预测局部结构。通过优化程序,Q16预测率达到40.7%。本文研究了PBs的两个方面。首先,我们确定数据库扩充对其定义的影响。结果表明,不同PBs的几何特征得以保留(平均局部RMSD值等于0.41 Å),并且在数据库扩充时序列-结构特异性得到增强。其次,我们改进了从序列优化PB预测的方法,重新审视了优化程序并探索了不同的局部预测策略。对序列-局部结构关系使用统计优化程序可将预测准确率提高8%(Q16 = 48.7%)。在不损失其他局部折叠预测效率的情况下,对重复结构有了更好的识别。添加二级结构预测仅将Q16的准确率提高了1%。我们提出了一个与预测的PBs和真实局部结构之间差异的RMSD值密切相关的熵指数(Neq)来估计预测质量。Neq与为一大组蛋白质计算的Q16预测率分布呈线性相关。由此推导出一个“预期”预测率QE16,平均误差为5%。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验