Bastien Olivier, Roy Sylvaine, Maréchal Eric
Laboratoire de Physiologie Cellulaire Végétale, Département Réponse et Dynamique Cellulaire, UMR 5019, CNRS-CEA-INRA-Université Joseph-Fourier, CEA Grenoble, 17, rue des Martyrs, 38054 Grenoble, France.
C R Biol. 2005 May;328(5):445-53. doi: 10.1016/j.crvi.2005.02.002.
Automatic comparison of compositionally biased genomes, such as that of the malarial causative agent Plasmodium falciparum (82% adenosine + thymidine), with genomes of average composition, is currently limited. Indeed, popular tools such as BLAST require that amino acid distributions be similar in aligned sequences. However, the P. falciparum genome is so biased that six amino acids account for more than 50% of the protein composition. One reason for the comparison methods failure lies in the compositional difference between the query and the subject proteomes, which is not taken into account in the amino acid substitution matrices. This paper introduces a method to derive substitution matrices, in particular BLOSUM 62, in the frame of the information theory. It allows the construction of non-symmetrical matrices, taking into account the non-symmetric amino acid distributions. The dirAtPf family of matrices allowing the comparison of P. falciparum and A. thaliana is given as an example. This paper further provides an analysis of the obtained matrices in the frame of the information theory, supporting the discrimination advantage they bring.
目前,将组成存在偏差的基因组(如疟疾病原体恶性疟原虫的基因组,其腺苷+胸腺嘧啶含量为82%)与平均组成的基因组进行自动比较存在局限性。实际上,像BLAST这样的常用工具要求比对序列中的氨基酸分布相似。然而,恶性疟原虫的基因组偏差极大,六种氨基酸占蛋白质组成的比例超过50%。比较方法失败的一个原因在于查询蛋白质组和目标蛋白质组之间的组成差异,而氨基酸替换矩阵并未考虑这一点。本文介绍了一种在信息论框架下推导替换矩阵,特别是BLOSUM 62矩阵的方法。它允许构建非对称矩阵,同时考虑到非对称的氨基酸分布。以允许比较恶性疟原虫和拟南芥的dirAtPf矩阵家族为例。本文还在信息论框架下对所得矩阵进行了分析,证明了它们所带来的区分优势。