利用序列-结构-功能范式对大肠杆菌基因组进行功能分析：鉴定具有谷氧还蛋白/硫氧还蛋白二硫键氧化还原酶活性的蛋白质。

Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the glutaredoxin/thioredoxin disulfide oxidoreductase activity.

作者信息

Fetrow J S, Godzik A, Skolnick J

机构信息

Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA.

出版信息

J Mol Biol. 1998 Oct 2;282(4):703-11. doi: 10.1006/jmbi.1998.2061.

DOI:10.1006/jmbi.1998.2061

PMID:9743619

Abstract

The application of an automated method for the screening of protein activity based on the sequence-to-structure-to-function paradigm is presented for the complete Escherichia coli genome. First, the structure of the protein is identified from its sequence using a threading algorithm, which aligns the sequences to the best matching structure in a structural database and extends sequence analysis well beyond the limits of local sequence identity. Then, the active site is identified in the resulting sequence-to-structure alignment using a "fuzzy functional form" (FFF), a three-dimensional descriptor of the active site of a protein. Here, this sequence-to-structure-to-function concept is applied to analysis of the complete E. coli genome, i.e. all E. coli open reading frames (ORFs) are screened for the thiol-disulfide oxidoreductase activity of the glutaredoxin/thioredoxin protein family. We show that the method can identify the active sites in ten sequences that are known to or proposed to exhibit this activity. Furthermore, oxidoreductase activity is predicted in two other sequences that have not been identified previously. This method distinguishes protein pairs with similar active sites from proteins pairs that are just topological cousins, i.e. those having similar global folds, but not necessarily similar active sites. Thus, this method provides a novel approach for extraction of active site and functional information based on three-dimensional structures, rather than simple sequence analysis. Prediction of protein activity is fully automated and easily extendible to new functions. Finally, it is demonstrated here that the method can be applied to complete genome database analysis.

摘要

本文介绍了一种基于序列-结构-功能范式的蛋白质活性自动化筛选方法在完整大肠杆菌基因组中的应用。首先，使用穿线算法从蛋白质序列中识别其结构，该算法将序列与结构数据库中最佳匹配的结构进行比对，并将序列分析扩展到远远超出局部序列同一性的范围。然后，使用“模糊功能形式”（FFF）在所得的序列-结构比对中识别活性位点，FFF是蛋白质活性位点的三维描述符。在此，这种序列-结构-功能概念应用于完整大肠杆菌基因组的分析，即对所有大肠杆菌开放阅读框（ORF）进行谷氧还蛋白/硫氧还蛋白家族的硫醇-二硫键氧化还原酶活性筛选。我们表明该方法可以在十个已知或被认为具有这种活性的序列中识别活性位点。此外，在另外两个先前未被鉴定的序列中预测到了氧化还原酶活性。该方法能够区分具有相似活性位点的蛋白质对与仅仅是拓扑相似的蛋白质对，即那些具有相似整体折叠但不一定具有相似活性位点的蛋白质对。因此，该方法提供了一种基于三维结构而非简单序列分析来提取活性位点和功能信息的新方法。蛋白质活性预测是完全自动化的，并且易于扩展到新功能。最后，本文证明了该方法可应用于完整基因组数据库分析。