Jones D T
Department of Biological Sciences, University of Warwick, Coventry, CV4 7AL, UK.
J Mol Biol. 1999 Apr 9;287(4):797-815. doi: 10.1006/jmbi.1999.2583.
A new protein fold recognition method is described which is both fast and reliable. The method uses a traditional sequence alignment algorithm to generate alignments which are then evaluated by a method derived from threading techniques. As a final step, each threaded model is evaluated by a neural network in order to produce a single measure of confidence in the proposed prediction. The speed of the method, along with its sensitivity and very low false-positive rate makes it ideal for automatically predicting the structure of all the proteins in a translated bacterial genome (proteome). The method has been applied to the genome of Mycoplasma genitalium, and analysis of the results shows that as many as 46 % of the proteins derived from the predicted protein coding regions have a significant relationship to a protein of known structure. In some cases, however, only one domain of the protein can be predicted, giving a total coverage of 30 % when calculated as a fraction of the number of amino acid residues in the whole proteome.
本文描述了一种新型的蛋白质折叠识别方法,该方法既快速又可靠。该方法使用传统的序列比对算法生成比对结果,然后通过一种源自穿线技术的方法对其进行评估。作为最后一步,每个穿线模型都由神经网络进行评估,以便对所提出的预测产生单一的置信度度量。该方法的速度、灵敏度以及极低的假阳性率使其成为自动预测翻译后的细菌基因组(蛋白质组)中所有蛋白质结构的理想选择。该方法已应用于生殖支原体的基因组,结果分析表明,源自预测蛋白质编码区的多达46%的蛋白质与已知结构的蛋白质存在显著关系。然而,在某些情况下,只能预测蛋白质的一个结构域,以整个蛋白质组中氨基酸残基数的比例计算,总覆盖率为30%。