Applied Statistics Unit, Indian Statistical Institute, Kolkata-700108, India.
Applied Statistics Unit, Indian Statistical Institute, Kolkata-700108, India.
Genomics. 2019 Jul;111(4):549-559. doi: 10.1016/j.ygeno.2018.03.010. Epub 2018 Mar 13.
This article introduces an alignment-free clustering method in order to cluster all the 66 DORs sequentially diverse protein sequences. Two different methods are discussed: one is utilizing twenty standard amino acids (without grouping) and another one is using chemical grouping of amino acids (with grouping). Two grayscale images (representing two protein sequences by order pair frequency matrices) are compared to find the similarity index using morphology technique. We could achieve the correlation coefficients of 0.9734 and 0.9403 for without and with grouping methods respectively with the ClustalW result in the ND5 dataset, which are much better than some of the existing alignment-free methods. Based on the similarity index, the 66 DORs are clustered into three classes - Highest, Moderate and Lowest - which are seen to be best fitted for 66 DORs protein sequences. OR83b is the distinguished olfactory receptor expressed in divergent insect population which is substantiated through our investigation.
本文提出了一种无比对聚类方法,以便对所有 66 个 DOR 进行顺序多样的蛋白质序列聚类。讨论了两种不同的方法:一种是利用二十种标准氨基酸(不分组),另一种是利用氨基酸的化学分组(分组)。通过形态学技术比较两个灰度图像(通过顺序对频率矩阵表示两个蛋白质序列),以找到相似性指数。在 ND5 数据集的 ClustalW 结果中,无分组和分组方法的相关系数分别为 0.9734 和 0.9403,优于一些现有的无比对方法。基于相似性指数,将 66 个 DOR 聚类为三个类别——最高、中等和最低——这三个类别最适合 66 个 DOR 蛋白质序列。通过我们的研究,证实了 OR83b 是在不同的昆虫种群中表达的有区别的嗅觉受体。