GenTHREADER：一种用于基因组序列的高效且可靠的蛋白质折叠识别方法。

GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences.

作者信息

Jones D T

机构信息

Department of Biological Sciences, University of Warwick, Coventry, CV4 7AL, UK.

出版信息

J Mol Biol. 1999 Apr 9;287(4):797-815. doi: 10.1006/jmbi.1999.2583.

DOI:10.1006/jmbi.1999.2583

PMID:10191147

Abstract

A new protein fold recognition method is described which is both fast and reliable. The method uses a traditional sequence alignment algorithm to generate alignments which are then evaluated by a method derived from threading techniques. As a final step, each threaded model is evaluated by a neural network in order to produce a single measure of confidence in the proposed prediction. The speed of the method, along with its sensitivity and very low false-positive rate makes it ideal for automatically predicting the structure of all the proteins in a translated bacterial genome (proteome). The method has been applied to the genome of Mycoplasma genitalium, and analysis of the results shows that as many as 46 % of the proteins derived from the predicted protein coding regions have a significant relationship to a protein of known structure. In some cases, however, only one domain of the protein can be predicted, giving a total coverage of 30 % when calculated as a fraction of the number of amino acid residues in the whole proteome.

摘要

本文描述了一种新型的蛋白质折叠识别方法，该方法既快速又可靠。该方法使用传统的序列比对算法生成比对结果，然后通过一种源自穿线技术的方法对其进行评估。作为最后一步，每个穿线模型都由神经网络进行评估，以便对所提出的预测产生单一的置信度度量。该方法的速度、灵敏度以及极低的假阳性率使其成为自动预测翻译后的细菌基因组（蛋白质组）中所有蛋白质结构的理想选择。该方法已应用于生殖支原体的基因组，结果分析表明，源自预测蛋白质编码区的多达46%的蛋白质与已知结构的蛋白质存在显著关系。然而，在某些情况下，只能预测蛋白质的一个结构域，以整个蛋白质组中氨基酸残基数的比例计算，总覆盖率为30%。