Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, The Russian Academy of Sciences, Miklukho-Maklaya St. 16/10, 117997 Moscow, Russia.
Steklov Mathematical Institute and of Russian Academy of Sciences, 8 Gubkina St., 119991 Moscow, Russia.
Int J Mol Sci. 2021 Aug 3;22(15):8339. doi: 10.3390/ijms22158339.
Most non-communicable diseases are associated with dysfunction of proteins or protein complexes. The relationship between sequence and structure has been analyzed for a long time, and the analysis of the sequences organization in domains and motifs remains an actual research area. Here, we propose a mathematical method for revealing the hierarchical organization of protein sequences. The method is based on the pentapeptide as a unit of protein sequences. Employing the frequency of occurrence of pentapeptides in sequences of natural proteins and a special mathematical approach, this method revealed a hierarchical structure in the protein sequence. The method was applied to 24,647 non-homologous protein sequences with sizes ranging from 50 to 400 residues from the NRDB90 database. Statistical analysis of the branching points of the graphs revealed 11 characteristic values of y (the width of the inscribed function), showing the relationship of these multiple fragments of the sequences. Several examples illustrate how fragments of the protein spatial structure correspond to the elements of the hierarchical structure of the protein sequence. This methodology provides a promising basis for a mathematically-based classification of the elements of the spatial organization of proteins. Elements of the hierarchical structure of different levels of the hierarchy can be used to solve biotechnological and medical problems.
大多数非传染性疾病都与蛋白质或蛋白质复合物的功能障碍有关。人们已经对序列和结构之间的关系进行了长期分析,而对结构域和基序中序列组织的分析仍然是一个实际的研究领域。在这里,我们提出了一种揭示蛋白质序列层次结构的数学方法。该方法基于五肽作为蛋白质序列的单元。该方法利用天然蛋白质序列中五肽的出现频率和一种特殊的数学方法,揭示了蛋白质序列中的层次结构。该方法应用于来自 NRDB90 数据库的 24647 个大小在 50 到 400 个残基之间的非同源蛋白质序列。对图形分支点的统计分析揭示了 11 个特征值 y(内接函数的宽度),显示了这些序列片段之间的关系。几个例子说明了蛋白质空间结构的片段如何与蛋白质序列层次结构的元素相对应。这种方法为基于数学的蛋白质空间组织元素分类提供了有前途的基础。不同层次结构的层次结构元素可用于解决生物技术和医学问题。