Lund O, Frimand K, Gorodkin J, Bohr H, Bohr J, Hansen J, Brunak S
Center for Biological Sequence Analysis, The Technical University of Denmark, Lyngby.
Protein Eng. 1997 Nov;10(11):1241-8. doi: 10.1093/protein/10.11.1241.
We predict interatomic Calpha distances by two independent data driven methods. The first method uses statistically derived probability distributions of the pairwise distance between two amino acids, whilst the latter method consists of a neural network prediction approach equipped with windows taking the context of the two residues into account. These two methods are used to predict whether distances in independent test sets were above or below given thresholds. We investigate which distance thresholds produce the most information-rich constraints and, in turn, the optimal performance of the two methods. The predictions are based on a data set derived using a new threshold which defines when sequence similarity implies structural similarity. We show that distances in proteins are predicted more accurately by neural networks than by probability density functions. We show that the accuracy of the predictions can be further increased by using sequence profiles. A threading method based on the predicted distances is presented. A homepage with software, predictions and data related to this paper is available at http://www.cbs.dtu.dk/services/CPHmodels/.
我们通过两种独立的数据驱动方法预测原子间的Cα距离。第一种方法使用两个氨基酸之间成对距离的统计推导概率分布,而后者则是一种配备窗口的神经网络预测方法,该窗口考虑了两个残基的上下文。这两种方法用于预测独立测试集中的距离是高于还是低于给定阈值。我们研究哪些距离阈值能产生信息最丰富的约束条件,进而研究这两种方法的最佳性能。这些预测基于一个使用新阈值导出的数据集,该阈值定义了序列相似性何时意味着结构相似性。我们表明,神经网络预测蛋白质中的距离比概率密度函数更准确。我们还表明,使用序列概况可以进一步提高预测的准确性。提出了一种基于预测距离的穿线方法。与本文相关的软件、预测和数据的主页可在http://www.cbs.dtu.dk/services/CPHmodels/获取。