Suppr超能文献

条纹床单与蛋白质接触预测。

Striped sheets and protein contact prediction.

作者信息

MacCallum Robert M

机构信息

Stockholm Bioinformatics Center, Stockholm University, Stockholm, Sweden.

出版信息

Bioinformatics. 2004 Aug 4;20 Suppl 1:i224-31. doi: 10.1093/bioinformatics/bth913.

Abstract

MOTIVATION

Current approaches to contact map prediction in proteins have focused on amino acid conservation and patterns of mutation at sequentially distant positions. This sequence information is poorly understood and very little progress has been made in this area during recent years.

RESULTS

In this study, an observation of 'striped' sequence patterns across beta-sheets prompted the development of a new type of contact map predictor. Computer program code was evolved with an evolutionary algorithm (genetic programming) to select residues and residue pairs likely to make contacts based solely on local sequence patterns extracted with the help of self-organizing maps. The mean prediction accuracy is 27% on a validation set of 156 domains up to 400 residues in length, where contacts are separated by at least 8 residues and length/10 pairs are predicted. The retrospective accuracy on a set of 15 CASP5 targets is 27% and 14% for length/10 and length/2 predicted pairs, respectively (both using a minimum residue separation of 24). This compares favourably to the equivalent 21% and 13% obtained for the best automated contact prediction methods at CASP5. The results suggest that protein architectures impose regularities in local sequence environments. Other sources of information, such as correlated/compensatory mutations, may further improve accuracy.

AVAILABILITY

A web-based prediction service is available at http://www.sbc.su.se/~maccallr/contactmaps

摘要

动机

目前蛋白质中接触图预测的方法主要集中在氨基酸保守性以及序列上相距较远位置的突变模式。这种序列信息的理解还很有限,并且近年来在该领域几乎没有取得什么进展。

结果

在本研究中,对β折叠上“条纹状”序列模式的观察促使开发了一种新型的接触图预测器。利用进化算法(遗传编程)对计算机程序代码进行演化,以仅基于借助自组织图提取的局部序列模式来选择可能形成接触的残基和残基对。在一个由156个长度达400个残基的结构域组成的验证集上,平均预测准确率为27%,其中接触残基之间至少相隔8个残基,且预测长度/10的配对。对于一组15个CASP5目标,长度/10和长度/2预测配对的回顾性准确率分别为27%和14%(两者均使用至少相隔24个残基的最小分离距离)。这与CASP5中最佳自动接触预测方法获得的21%和13%相比具有优势。结果表明蛋白质结构在局部序列环境中呈现出规律性。其他信息来源,如相关/补偿性突变,可能会进一步提高准确率。

可用性

可通过http://www.sbc.su.se/~maccallr/contactmaps获得基于网络的预测服务。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验