Guo Jun-tao, Ellrott Kyle, Chung Won Jae, Xu Dong, Passovets Serguei, Xu Ying
Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA 30606, USA.
Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W522-5. doi: 10.1093/nar/gkh414.
Knowledge of the detailed structure of a protein is crucial to our understanding of the biological functions of that protein. The gap between the number of solved protein structures and the number of protein sequences continues to widen rapidly in the post-genomics era due to long and expensive processes for solving structures experimentally. Computational prediction of structures from amino acid sequence has come to play a key role in narrowing the gap and has been successful in providing useful information for the biological research community. We have developed a prediction pipeline, PROSPECT-PSPP, an integration of multiple computational tools, for fully automated protein structure prediction. The pipeline consists of tools for (i) preprocessing of protein sequences, which includes signal peptide prediction, protein type prediction (membrane or soluble) and protein domain partition, (ii) secondary structure prediction, (iii) fold recognition and (iv) atomic structural model generation. The centerpiece of the pipeline is our threading-based program PROSPECT. The pipeline is implemented using SOAP (Simple Object Access Protocol), which makes it easier to share our tools and resources. The pipeline has an easy-to-use user interface and is implemented on a 64-node dual processor Linux cluster. It can be used for genome-scale protein structure prediction. The pipeline is accessible at http://csbl.bmb.uga.edu/protein_pipeline.
了解蛋白质的详细结构对于我们理解该蛋白质的生物学功能至关重要。在后基因组时代,由于通过实验解析结构的过程漫长且昂贵,已解析的蛋白质结构数量与蛋白质序列数量之间的差距仍在迅速扩大。从氨基酸序列进行结构的计算预测在缩小这一差距方面发挥了关键作用,并已成功为生物学研究界提供了有用信息。我们开发了一种预测流程PROSPECT - PSPP,它整合了多种计算工具,用于全自动蛋白质结构预测。该流程由用于以下方面的工具组成:(i)蛋白质序列的预处理,包括信号肽预测、蛋白质类型预测(膜蛋白或可溶性蛋白)和蛋白质结构域划分;(ii)二级结构预测;(iii)折叠识别;以及(iv)原子结构模型生成。该流程的核心是我们基于穿线法的程序PROSPECT。该流程使用SOAP(简单对象访问协议)实现,这使得共享我们的工具和资源更加容易。该流程具有易于使用的用户界面,并在一个64节点双处理器Linux集群上实现。它可用于基因组规模的蛋白质结构预测。该流程可通过http://csbl.bmb.uga.edu/protein_pipeline访问。