Lin K, May A C, Taylor W R
Division of Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill NW7 1AA, United Kingdom.
J Comput Biol. 2001;8(5):471-81. doi: 10.1089/106652701753216495.
An amino acid substitution matrix specifies probabilities of substitutions for each pair of the 20 amino acids. Log-odds scores transformed from the values in substitution matrices are widely used to construct protein sequence alignments. Any given substitution matrix is suited to matching sequences diverged by a specific evolutionary distance. However, for a given set of sequences, it is not always clear what matrix should be used. We used an artificial neural network model to predict probabilities of amino acid substitutions with alignment samples of different evolutionary distances. From this internal description, substitution matrices suitable for detecting relationships at any chosen evolutionary distance can be instantly generated. By using the additional information of evolutionary distances, the average cross entropy error of our neural network model is lower than that of a series of BLOSUM and PET matrices over all testing sets. Our model is more accurate on the prediction of amino acid substitution probabilities.
氨基酸替换矩阵规定了20种氨基酸中每一对之间的替换概率。从替换矩阵中的值转换而来的对数似然得分被广泛用于构建蛋白质序列比对。任何给定的替换矩阵都适用于匹配因特定进化距离而分化的序列。然而,对于给定的一组序列,并不总是清楚应该使用什么矩阵。我们使用人工神经网络模型,通过不同进化距离的比对样本预测氨基酸替换的概率。根据这一内部描述,可以立即生成适合检测任何选定进化距离下关系的替换矩阵。通过使用进化距离的附加信息,我们神经网络模型的平均交叉熵误差在所有测试集上都低于一系列BLOSUM和PET矩阵。我们的模型在预测氨基酸替换概率方面更准确。