Jiang Peiran, Lugo-Martinez Jose
Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA.
J Comput Biol. 2025 Jul;32(7):659-674. doi: 10.1089/cmb.2025.0076. Epub 2025 May 28.
Protein pockets are essential for many proteins to carry out their functions. Locating and measuring protein pockets, as well as studying the anatomy of pockets, helps us further understand protein function. Most research studies focus on learning either local or global information from protein structures. However, there is a lack of studies that leverage the power of integrating both local and global representations of these structures. In this work, we combine topological data analysis (TDA) and geometric deep learning (GDL) to analyze the putative protein pockets of enzymes. TDA captures blueprints of the global topological invariant of protein pockets, whereas GDL decomposes the fingerprints into building blocks of these pockets. This integration of local and global views provides a comprehensive and complementary understanding of the protein structural motifs ( for short) within protein pockets. We also analyze the distribution of the building blocks making up the pocket and profile the predictive power of coupling local and global representations for the task of discriminating between enzymes and nonenzymes, as well as predicting the enzyme class. We demonstrate that our representation learning framework for macromolecules is particularly useful when the structure is known, and the scenarios heavily rely on local and global information.
蛋白质口袋对于许多蛋白质发挥其功能至关重要。定位和测量蛋白质口袋,以及研究口袋的结构,有助于我们进一步理解蛋白质功能。大多数研究专注于从蛋白质结构中学习局部或全局信息。然而,缺乏利用整合这些结构的局部和全局表示的研究。在这项工作中,我们结合拓扑数据分析(TDA)和几何深度学习(GDL)来分析酶的假定蛋白质口袋。TDA捕获蛋白质口袋全局拓扑不变量的蓝图,而GDL将指纹分解为这些口袋的组成部分。这种局部和全局视图的整合提供了对蛋白质口袋内蛋白质结构基序(简称)的全面且互补的理解。我们还分析了构成口袋的组成部分的分布,并概述了耦合局部和全局表示在区分酶和非酶以及预测酶类任务中的预测能力。我们证明,当结构已知且场景严重依赖局部和全局信息时,我们的大分子表示学习框架特别有用。