Zintzaras Elias
Department of Biomathematics, University of Thessaly School of Medicine, Larissa, Greece.
Comput Biol Med. 2008 Apr;38(4):469-74. doi: 10.1016/j.compbiomed.2008.01.006. Epub 2008 Mar 4.
A methodology for testing the correlation between the sequence and structure distances of proteins is proposed. Structure distances were derived by applying a forward growing classification tree algorithm on defined physico-chemical and geometrical properties of the structures. The structure distance for every pair of proteins was defined as the number of intermediate nodes in the tree. Sequence distances were derived using pairwise sequence alignment. Then, correlation between sequence distance matrix and sequence distance matrix was tested using a Monte Carlo permutation test. The results were compared to those when the double dynamic structure alignment method (SSAP) was applied. The methodology was applied to a data set of 74 proteins belonging to 14 families. The classification tree was able to identify the protein families (the misclassification rate was R=1.4%) and a 74x74 structure distance matrix was produced. For every pair of protein sequences a dissimilarity score was recorded and a sequence distance matrix was produced. The Monte Carlo permutation produced a correlation coefficient r=0.403 (P<0.001). The SSAP method produced similar results. The proposed methodology may assist in assessing whether protein sequence distances can be predictors of protein structure distances.
提出了一种用于测试蛋白质序列距离与结构距离之间相关性的方法。结构距离是通过对结构的定义物理化学和几何特性应用前向生长分类树算法得出的。每对蛋白质的结构距离定义为树中的中间节点数。序列距离是通过成对序列比对得出的。然后,使用蒙特卡罗置换检验来测试序列距离矩阵与结构距离矩阵之间的相关性。将结果与应用双动态结构比对方法(SSAP)时的结果进行比较。该方法应用于属于14个家族的74种蛋白质的数据集。分类树能够识别蛋白质家族(错误分类率为R = 1.4%),并生成了一个74x74的结构距离矩阵。对于每对蛋白质序列,记录了一个差异得分并生成了一个序列距离矩阵。蒙特卡罗置换产生的相关系数r = 0.403(P < 0.001)。SSAP方法产生了类似的结果。所提出的方法可能有助于评估蛋白质序列距离是否可以作为蛋白质结构距离的预测指标。