一种用于基因组规模蛋白质结构预测与分析的计算流程。

A computational pipeline for protein structure prediction and analysis at genome scale.

作者信息

Shah Manesh, Passovets Sergei, Kim Dongsup, Ellrott Kyle, Wang Li, Vokler Inna, LoCascio Philip, Xu Dong, Xu Ying

机构信息

Life Sciences Division, Oak Ridge National Laboratory, TN 37830-6480, USA.

出版信息

Bioinformatics. 2003 Oct 12;19(15):1985-96. doi: 10.1093/bioinformatics/btg262.

DOI:10.1093/bioinformatics/btg262

PMID:14555633

Abstract

MOTIVATION

Experimental techniques alone cannot keep up with the production rate of protein sequences, while computational techniques for protein structure predictions have matured to such a level to provide reliable structural characterization of proteins at large scale. Integration of multiple computational tools for protein structure prediction can complement experimental techniques.

RESULTS

We present an automated pipeline for protein structure prediction. The centerpiece of the pipeline is our threading-based protein structure prediction system PROSPECT. The pipeline consists of a dozen tools for identification of protein domains and signal peptide, protein triage to determine the protein type (membrane or globular), protein fold recognition, generation of atomic structural models, prediction result validation, etc. Different processing and prediction branches are determined automatically by a prediction pipeline manager based on identified characteristics of the protein. The pipeline has been implemented to run in a heterogeneous computational environment as a client/server system with a web interface. Genome-scale applications on Caenorhabditis elegans, Pyrococcus furiosus and three cyanobacterial genomes are presented.

AVAILABILITY

The pipeline is available at http://compbio.ornl.gov/proteinpipeline/

摘要

动机

仅靠实验技术无法跟上蛋白质序列的产生速度，而用于蛋白质结构预测的计算技术已成熟到能够大规模提供可靠的蛋白质结构特征描述。整合多种用于蛋白质结构预测的计算工具可补充实验技术。

结果

我们提出了一种用于蛋白质结构预测的自动化流程。该流程的核心是我们基于穿线法的蛋白质结构预测系统PROSPECT。该流程由一打工具组成，用于识别蛋白质结构域和信号肽、对蛋白质进行分类以确定蛋白质类型（膜蛋白或球状蛋白）、蛋白质折叠识别、生成原子结构模型、预测结果验证等。预测流程管理器会根据所识别的蛋白质特征自动确定不同的处理和预测分支。该流程已实现作为具有Web界面的客户端/服务器系统在异构计算环境中运行。展示了在秀丽隐杆线虫、嗜热栖热菌和三个蓝藻基因组上的基因组规模应用。

可用性

该流程可在http://compbio.ornl.gov/proteinpipeline/获取。

相似文献

A computational pipeline for protein structure prediction and analysis at genome scale.

Bioinformatics. 2003 Oct 12;19(15):1985-96. doi: 10.1093/bioinformatics/btg262.

MannDB - a microbial database of automated protein sequence analyses and evidence integration for protein characterization.

BMC Bioinformatics. 2006 Oct 17;7:459. doi: 10.1186/1471-2105-7-459.

PROSPECT-PSPP: an automatic computational pipeline for protein structure prediction.

Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W522-5. doi: 10.1093/nar/gkh414.

WILMA-automated annotation of protein sequences.

Bioinformatics. 2004 Jan 1;20(1):127-8. doi: 10.1093/bioinformatics/btg380.

FCP: functional coverage of the proteome by structures.

Bioinformatics. 2006 Jul 15;22(14):1792-3. doi: 10.1093/bioinformatics/btl188. Epub 2006 May 16.

Fully automated ab initio protein structure prediction using I-SITES, HMMSTR and ROSETTA.

Bioinformatics. 2002;18 Suppl 1:S54-61. doi: 10.1093/bioinformatics/18.suppl_1.s54.

Adding some SPICE to DAS.

Bioinformatics. 2005 Sep 1;21 Suppl 2(Suppl 2):ii40-1. doi: 10.1093/bioinformatics/bti1106.

Predicting subcellular localization of proteins using machine-learned classifiers.

Bioinformatics. 2004 Mar 1;20(4):547-56. doi: 10.1093/bioinformatics/btg447. Epub 2004 Jan 22.

CoC: a database of universally conserved residues in protein folds.

Bioinformatics. 2005 May 15;21(10):2539-40. doi: 10.1093/bioinformatics/bti360. Epub 2005 Mar 3.

Soap-HT-BLAST: high throughput BLAST based on Web services.

Bioinformatics. 2003 Sep 22;19(14):1863-4. doi: 10.1093/bioinformatics/btg244.

引用本文的文献

A large-scale conformation sampling and evaluation server for protein tertiary structure prediction and its assessment in CASP11.

BMC Bioinformatics. 2015 Oct 23;16:337. doi: 10.1186/s12859-015-0775-x.

Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11.

Proteins. 2016 Sep;84 Suppl 1(Suppl 1):247-59. doi: 10.1002/prot.24924. Epub 2015 Sep 29.

An Improved Integration of Template-Based and Template-Free Protein Structure Modeling Methods and its Assessment in CASP11.

Protein Pept Lett. 2015;22(7):586-93. doi: 10.2174/0929866522666150520145717.

Designing and benchmarking the MULTICOM protein structure prediction system.

BMC Struct Biol. 2013 Feb 27;13:2. doi: 10.1186/1472-6807-13-2.

PCI-SS: MISO dynamic nonlinear protein secondary structure prediction.

BMC Bioinformatics. 2009 Jul 17;10:222. doi: 10.1186/1471-2105-10-222.

PDA: an automatic and comprehensive analysis program for protein-DNA complex structures.

BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):S13. doi: 10.1186/1471-2164-10-S1-S13.

A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation.

Proc Natl Acad Sci U S A. 2008 Jan 8;105(1):129-34. doi: 10.1073/pnas.0707684105. Epub 2007 Dec 28.

Fold assessment for comparative protein structure modeling.

Protein Sci. 2007 Nov;16(11):2412-26. doi: 10.1110/ps.072895107. Epub 2007 Sep 28.

PROSPECT-PSPP: an automatic computational pipeline for protein structure prediction.

Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W522-5. doi: 10.1093/nar/gkh414.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于基因组规模蛋白质结构预测与分析的计算流程。

A computational pipeline for protein structure prediction and analysis at genome scale.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献