Goethe University Frankfurt, Molecular Bioinformatics, Institute of Computer Science, Faculty of Computer Science and Mathematics, 60325 Frankfurt am Main, Germany.
Goethe University Frankfurt, University Hospital, Medical Clinic 1, 60590 Frankfurt am Main, Germany.
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae112.
The functional complexity of biochemical processes is strongly related to the interplay of proteins and their assembly into protein complexes. In recent years, the discovery and characterization of protein complexes have substantially progressed through advances in cryo-electron microscopy, proteomics, and computational structure prediction. This development results in a strong need for computational approaches to analyse the data of large protein complexes for structural and functional characterization. Here, we aim to provide a suitable approach, which processes the growing number of large protein complexes, to obtain biologically meaningful information on the hierarchical organization of the structures of protein complexes.
We modelled the quaternary structure of protein complexes as undirected, labelled graphs called complex graphs. In complex graphs, the vertices represent protein chains and the edges spatial chain-chain contacts. We hypothesized that clusters based on the complex graph correspond to functional biological modules. To compute the clusters, we applied the Leiden clustering algorithm. To evaluate our approach, we chose the human respiratory complex I, which has been extensively investigated and exhibits a known biological module structure experimentally validated. Additionally, we characterized a eukaryotic group II chaperonin TRiC/CCT and the head of the bacteriophage Φ29. The analysis of the protein complexes correlated with experimental findings and indicated known functional, biological modules. Using our approach enables not only to predict functional biological modules in large protein complexes with characteristic features but also to investigate the flexibility of specific regions and coformational changes. The predicted modules can aid in the planning and analysis of experiments.
Jupyter notebooks to reproduce the examples are available on our public GitHub repository: https://github.com/MolBIFFM/PTGLtools/tree/main/PTGLmodulePrediction.
生化过程的功能复杂性与蛋白质之间的相互作用及其组装成蛋白质复合物密切相关。近年来,通过低温电子显微镜、蛋白质组学和计算结构预测的进步,蛋白质复合物的发现和特性描述取得了实质性进展。这一发展导致人们强烈需要计算方法来分析大型蛋白质复合物的数据,以进行结构和功能特征分析。在这里,我们旨在提供一种合适的方法,用于处理越来越多的大型蛋白质复合物,以获得关于蛋白质复合物结构的层次组织的生物学有意义的信息。
我们将蛋白质复合物的四级结构建模为无向标记图,称为复合物图。在复合物图中,顶点代表蛋白质链,边代表空间链-链接触。我们假设基于复合物图的聚类对应于功能生物模块。为了计算聚类,我们应用了 Leiden 聚类算法。为了评估我们的方法,我们选择了人类呼吸复合物 I,它已经被广泛研究,并具有实验验证的已知生物模块结构。此外,我们还对真核生物 II 类热休克蛋白 TRiC/CCT 和噬菌体 Φ29 的头部进行了特征化。对蛋白质复合物的分析与实验结果相关,并表明了已知的功能、生物模块。使用我们的方法不仅可以预测具有特征的大型蛋白质复合物中的功能生物模块,还可以研究特定区域的灵活性和构象变化。预测的模块可以帮助规划和分析实验。
可在我们的公共 GitHub 存储库 https://github.com/MolBIFFM/PTGLtools/tree/main/PTGLmodulePrediction 上找到用于重现示例的 Jupyter 笔记本。