Osmanli Zarifa, Ferrero Elisa, Monzon Alexander Miguel, Tosatto Silvio C E, Piovesan Damiano
Department of Biomedical Sciences, University of Padova, Padova, 35121, Italy.
Galileian School of Higher Education, University of Padova, Padova, 35132, Italy.
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf395.
Structured tandem repeat proteins (STRPs) are characterized by preserved structural motifs arranged in a modular way. The structural and functional diversity of STRPs makes them particularly important for studying evolution and novel structure-function relationships, and ultimately for designing new synthetic proteins with specific functions. One crucial aspect of their classification is the estimation of geometrical parameters, which can provide better insight into their properties and the relationship between the spatial arrangement of repeated units and protein function. Calculating geometric descriptors for STRPs is challenging because naturally occurring repeats are not "perfect" and often contain insertions and deletions. Existing tools for predicting structural symmetry work well on simple cases but often fail for most natural proteins.
Here, we present GeomeTRe, an algorithm that calculates geometrical descriptors such as curvature (yaw), twist (roll), and pitch for a protein structure with known repeat unit positions. The algorithm simulates the movement of consecutive units, identifies rotational axes, and calculates the corresponding Tait-Bryan angles. GeomeTRe's parameters can enhance STRP annotation and classification by identifying variations in geometric arrangements among different functional groups. The package is fast and suitable for processing large protein structure datasets when repeat region information (e.g. from RepeatsDB) is available.
GeomeTRe is available as a Python package; source code and documentation can be found at https://github.com/BioComputingUP/GeomeTRe.
结构化串联重复蛋白(STRP)的特征是具有以模块化方式排列的保守结构基序。STRP的结构和功能多样性使其对于研究进化和新的结构-功能关系特别重要,最终对于设计具有特定功能的新型合成蛋白也很重要。其分类的一个关键方面是几何参数的估计,这可以更好地洞察其特性以及重复单元的空间排列与蛋白质功能之间的关系。计算STRP的几何描述符具有挑战性,因为天然存在的重复序列并不“完美”,并且经常包含插入和缺失。现有的预测结构对称性的工具在简单情况下效果良好,但对于大多数天然蛋白质往往会失败。
在这里,我们展示了GeomeTRe,一种算法,它可以为具有已知重复单元位置的蛋白质结构计算诸如曲率(偏航)、扭转(滚动)和螺距等几何描述符。该算法模拟连续单元的运动,识别旋转轴,并计算相应的泰特-布莱恩角。GeomeTRe的参数可以通过识别不同功能组之间几何排列的变化来增强STRP的注释和分类。当有重复区域信息(例如来自RepeatsDB)时,该软件包速度快且适用于处理大型蛋白质结构数据集。
GeomeTRe作为一个Python软件包可用;源代码和文档可在https://github.com/BioComputingUP/GeomeTRe上找到。