Yang An-Suei, Wang Lu-yong
Department of Pharmacology, Columbia Genome Center, and Center for Computational Biology and Bioinformatics, Columbia University, 630 West 168th street, PH 7 W Room 318, New York, NY 10032, USA.
Bioinformatics. 2003 Jul 1;19(10):1267-74. doi: 10.1093/bioinformatics/btg151.
A large body of experimental and theoretical evidence suggests that local structural determinants are frequently encoded in short segments of protein sequence. Although the local structural information, once recognized, is particularly useful in protein structural and functional analyses, it remains a difficult problem to identify embedded local structural codes based solely on sequence information.
In this paper, we describe a local structure prediction method aiming at predicting the backbone structures of nine-residue sequence segments. Two elements are the keys for this local structure prediction procedure. The first key element is the LSBSP1 database, which contains a large number of non-redundant local structure-based sequence profiles for nine-residue structure segments. The second key element is the consensus approach, which identifies a consensus structure from a set of hit structures. The local structure prediction procedure starts by matching a query sequence segment of nine consecutive amino acid residues to all the sequence profiles in the local structure-based sequence profile database (LSBSP1). The consensus structure, which is at the center of the largest structural cluster of the hit structures, is predicted to be the native state structure adopted by the query sequence segment. This local structure prediction method is assessed with a large set of random test protein structures that have not been used in constructing the LSBSP1 database. The benchmark results indicate that the prediction capacities of the novel local structure prediction procedure exceed the prediction capacities of the local backbone structure prediction methods based on the I-sites library by a significant margin.
All the computational and assessment procedures have been implemented in the integrated computational system PrISM.1 (Protein Informatics System for Modeling). The system and associated databases for LINUX systems can be downloaded from the website: http://www.columbia.edu/~ay1/.
大量的实验和理论证据表明,局部结构决定因素常常编码在蛋白质序列的短片段中。尽管局部结构信息一旦被识别,在蛋白质结构和功能分析中特别有用,但仅基于序列信息识别嵌入的局部结构编码仍然是一个难题。
在本文中,我们描述了一种局部结构预测方法,旨在预测九个残基序列片段的主链结构。两个要素是该局部结构预测过程的关键。第一个关键要素是LSBSP1数据库,它包含大量针对九个残基结构片段的基于局部结构的非冗余序列概况。第二个关键要素是一致性方法,它从一组命中结构中识别出一个一致性结构。局部结构预测过程首先将九个连续氨基酸残基的查询序列片段与基于局部结构的序列概况数据库(LSBSP1)中的所有序列概况进行匹配。命中结构的最大结构簇中心的一致性结构被预测为查询序列片段采用的天然状态结构。这种局部结构预测方法用一大组未用于构建LSBSP1数据库的随机测试蛋白质结构进行评估。基准结果表明,这种新颖的局部结构预测过程的预测能力大大超过了基于I-sites库的局部主链结构预测方法的预测能力。
所有的计算和评估程序都已在集成计算系统PrISM.1(蛋白质信息学建模系统)中实现。用于LINUX系统的该系统及相关数据库可从网站:http://www.columbia.edu/~ay1/下载。