针对结构基因组学的新型折叠结构

Targeting novel folds for structural genomics.

作者信息

McGuffin Liam J, Jones David T

机构信息

Institute of Cancer Genetics and Pharmacogenomics, Department of Biological Sciences, Brunel University, Uxbridge, Middlesex, United Kingdom.

出版信息

Proteins. 2002 Jul 1;48(1):44-52. doi: 10.1002/prot.10129.

DOI:10.1002/prot.10129

PMID:12012336

Abstract

The ultimate goal of structural genomics is to obtain the structure of each protein coded by each gene within a genome to determine gene function. Because of cost and time limitations, it remains impractical to solve the structure for every gene product experimentally. Up to a point, reasonably accurate three-dimensional structures can be deduced for proteins with homologous sequences by using comparative modeling. Beyond this, fold recognition or threading methods can be used for proteins showing little homology to any known fold, although this is relatively time-consuming and limited by the library of template folds currently available. Therefore, it is appropriate to develop methods that can increase our knowledge base, expanding our fold libraries by earmarking potentially "novel" folds for experimental structure determination. How can we sift through proteomic data rapidly and yet reliably identify novel folds as targets for structural genomics? We have analyzed a number of simple methods that discriminate between "novel" and "known" folds. We propose that simple alignments of secondary structure elements using predicted secondary structure could potentially be a more selective method than both a simple fold recognition method (GenTHREADER) and standard sequence alignment at finding novel folds when sequences show no detectable homology to proteins with known structures.

摘要

结构基因组学的最终目标是确定基因组中每个基因所编码的每种蛋白质的结构，从而确定基因功能。由于成本和时间限制，通过实验确定每个基因产物的结构仍然不切实际。在一定程度上，利用比较建模可以为具有同源序列的蛋白质推导相当准确的三维结构。除此之外，对于与任何已知折叠几乎没有同源性的蛋白质，可以使用折叠识别或穿线法，尽管这相对耗时，并且受当前可用模板折叠库的限制。因此，开发能够增加我们的知识库、通过指定潜在的“新”折叠用于实验结构测定来扩展我们的折叠库的方法是合适的。我们如何快速筛选蛋白质组数据，同时可靠地识别新折叠作为结构基因组学的目标？我们分析了一些区分“新”折叠和“已知”折叠的简单方法。我们提出，当序列与已知结构的蛋白质没有可检测到的同源性时，使用预测二级结构对二级结构元件进行简单比对，在寻找新折叠方面可能比简单的折叠识别方法（GenTHREADER）和标准序列比对更具选择性。