Wei Xiaoqi, Chen Jiahui, Guo-Wei Wei
Department of Mathematics, Michigan State University, MI 48824, USA.
Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA.
ArXiv. 2023 Apr 6:arXiv:2301.10865v2.
Topological data analysis (TDA) is an emerging field in mathematics and data science. Its central technique, persistent homology, has had tremendous success in many science and engineering disciplines. However, persistent homology has limitations, including its inability to handle heterogeneous information, such as multiple types of geometric objects; being qualitative rather than quantitative, e.g., counting a 5-member ring the same as a 6-member ring, and a failure to describe non-topological changes, such as homotopic changes in protein-protein binding. Persistent topological Laplacians (PTLs), such as persistent Laplacian and persistent sheaf Laplacian, were proposed to overcome the limitations of persistent homology. In this work, we examine the modeling and analysis power of PTLs in the study of the protein structures of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike receptor binding domain (RBD). First, we employ PTLs to study how the RBD mutation-induced structural changes of RBD-angiotensin-converting enzyme 2 (ACE2) binding complexes are captured in the changes of spectra of the PTLs among SARS-CoV-2 variants. Additionally, we use PTLs to analyze the binding of RBD and ACE2-induced structural changes of various SARS-CoV-2 variants. Finally, we explore the impacts of computationally generated RBD structures on a topological deep learning paradigm and predictions of deep mutational scanning datasets for the SARS-CoV-2 Omicron BA.2 variant. Our results indicate that PTLs have advantages over persistent homology in analyzing protein structural changes and provide a powerful new TDA tool for data science.
拓扑数据分析(TDA)是数学和数据科学中的一个新兴领域。其核心技术——持久同调,在许多科学和工程学科中都取得了巨大成功。然而,持久同调也有局限性,包括无法处理异构信息,如多种类型的几何对象;定性而非定量,例如将一个五元环与一个六元环视为相同,以及无法描述非拓扑变化,如蛋白质 - 蛋白质结合中的同伦变化。为了克服持久同调的局限性,人们提出了持久拓扑拉普拉斯算子(PTL),如持久拉普拉斯算子和持久层拉普拉斯算子。在这项工作中,我们研究了PTL在严重急性呼吸综合征冠状病毒2(SARS-CoV-2)刺突受体结合域(RBD)蛋白质结构研究中的建模和分析能力。首先,我们使用PTL来研究RBD突变引起的RBD - 血管紧张素转换酶2(ACE2)结合复合物的结构变化是如何在SARS-CoV-2变体之间的PTL谱变化中被捕获的。此外,我们使用PTL来分析RBD与ACE2结合引起的各种SARS-CoV-2变体的结构变化。最后,我们探索计算生成的RBD结构对拓扑深度学习范式以及SARS-CoV-2奥密克戎BA.2变体深度突变扫描数据集预测的影响。我们的结果表明,PTL在分析蛋白质结构变化方面比持久同调具有优势,并为数据科学提供了一种强大的新TDA工具。