Qi Enfeng, Fu Can, Zhai Ying, Dong Jianghui
School of Mathematics and Statistics, Guangxi Normal University, Guilin 541000, China.
College of Biotechnology, Guilin Medical University, Guilin 541004, China.
Math Biosci Eng. 2020 Dec 25;18(1):837-850. doi: 10.3934/mbe.2021044.
Based on substrate sequences, we proposed a novel method for comparing sequence similarities among 68 proteases compiled from the MEROPS online database. The rank vector was defined based on the frequencies of amino acids at each site of the substrate, aiming to eliminate the different order variances of magnitude between proteases. Without any assumption on homology, a protease specificity tree is constructed with a striking clustering of proteases from different evolutionary origins and catalytic types. Compared with other methods, almost all the homologous proteases are clustered in small branches in our phylogenetic tree, and the proteases belonging to the same catalytic type are also clustered together, which may reflect the genetic relationship among the proteases. Meanwhile, certain proteases clustered together may play a similar role in key pathways categorized using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Consequently, this method can provide new insights into the shared similarities among proteases. This may inspire the design and development of targeted drugs that can specifically regulate protease activity.
基于底物序列,我们提出了一种新方法,用于比较从MEROPS在线数据库汇编的68种蛋白酶之间的序列相似性。秩向量是根据底物每个位点的氨基酸频率定义的,旨在消除蛋白酶之间大小顺序的不同差异。在没有任何同源性假设的情况下,构建了一个蛋白酶特异性树,不同进化起源和催化类型的蛋白酶呈现出显著的聚类。与其他方法相比,在我们的系统发育树中,几乎所有同源蛋白酶都聚集在小分支中,属于相同催化类型的蛋白酶也聚集在一起,这可能反映了蛋白酶之间的遗传关系。同时,聚集在一起的某些蛋白酶可能在使用京都基因与基因组百科全书(KEGG)数据库分类的关键途径中发挥类似作用。因此,该方法可以为蛋白酶之间的共同相似性提供新的见解。这可能会激发能够特异性调节蛋白酶活性的靶向药物的设计和开发。