Laboratory of Computational Biology, Centre for DNA Fingerprinting and Diagnostics, Uppal, Hyderabad, Telangana, 500039, India.
Graduate School, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India.
Sci Rep. 2019 Nov 8;9(1):16380. doi: 10.1038/s41598-019-52532-8.
An amino acid substitution scoring matrix encapsulates the rates at which various amino acid residues in proteins are substituted by other amino acid residues, over time. Database search methods make use of substitution scoring matrices to identify sequences with homologous relationships. However, widely used substitution scoring matrices, such as BLOSUM series, have been developed using aligned blocks that are mostly devoid of disordered regions in proteins. Hence, these substitution-scoring matrices are mostly inappropriate for homology searches involving proteins enriched with disordered regions as the disordered regions have distinct amino acid compositional bias, and therefore expected to have undergone amino acid substitutions that are distinct from those in the ordered regions. We, therefore, developed a novel series of substitution scoring matrices referred to as EDSSMat by exclusively considering the substitution frequencies of amino acids in the disordered regions of the eukaryotic proteins. The newly developed matrices were tested for their ability to detect homologs of proteins enriched with disordered regions by means of SSEARCH tool. The results unequivocally demonstrate that EDSSMat matrices detect more number of homologs than the widely used BLOSUM, PAM and other standard matrices, indicating their utility value for homology searches of intrinsically disordered proteins.
氨基酸取代评分矩阵概括了蛋白质中各种氨基酸残基随时间被其他氨基酸残基取代的速率。数据库搜索方法利用取代评分矩阵来识别具有同源关系的序列。然而,广泛使用的取代评分矩阵,如 BLOSUM 系列,是使用主要缺乏蛋白质中无规卷曲区域的对齐块开发的。因此,这些取代评分矩阵对于富含无规卷曲区域的蛋白质的同源性搜索不太合适,因为无规卷曲区域具有独特的氨基酸组成偏向性,因此预计会发生与有序区域不同的氨基酸取代。因此,我们专门考虑了真核蛋白质无规卷曲区域中氨基酸的取代频率,开发了一系列新的取代评分矩阵,称为 EDSSMat。我们使用 SSEARCH 工具测试了新开发的矩阵检测富含无规卷曲区域的蛋白质同源物的能力。结果明确表明,EDSSMat 矩阵比广泛使用的 BLOSUM、PAM 和其他标准矩阵检测到更多数量的同源物,这表明它们对固有无序蛋白质的同源性搜索具有实用价值。