Liu Jian, Xia Ke-Lin, Wu Jie, Yau Stephen Shing-Toung, Wei Guo-Wei
School of Mathematical Sciences, Hebei Normal University, Shijiazhuang, 050024 P. R. China.
Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, 101408 P. R. China.
Acta Math Sin Engl Ser. 2022;38(10):1901-1938. doi: 10.1007/s10114-022-2326-5. Epub 2022 Oct 15.
With the great advancement of experimental tools, a tremendous amount of biomolecular data has been generated and accumulated in various databases. The high dimensionality, structural complexity, the nonlinearity, and entanglements of biomolecular data, ranging from DNA knots, RNA secondary structures, protein folding configurations, chromosomes, DNA origami, molecular assembly, to others at the macromolecular level, pose a severe challenge in their analysis and characterization. In the past few decades, mathematical concepts, models, algorithms, and tools from algebraic topology, combinatorial topology, computational topology, and topological data analysis, have demonstrated great power and begun to play an essential role in tackling the biomolecular data challenge. In this work, we introduce biomolecular topology, which concerns the topological problems and models originated from the biomolecular systems. More specifically, the biomolecular topology encompasses topological structures, properties and relations that are emerged from biomolecular structures, dynamics, interactions, and functions. We discuss the various types of biomolecular topology from structures (of proteins, DNAs, and RNAs), protein folding, and protein assembly. A brief discussion of databanks (and databases), theoretical models, and computational algorithms, is presented. Further, we systematically review related topological models, including graphs, simplicial complexes, persistent homology, persistent Laplacians, de Rham-Hodge theory, Yau-Hausdorff distance, and the topology-based machine learning models.
随着实验工具的巨大进步,大量生物分子数据已在各种数据库中生成并积累。生物分子数据的高维度、结构复杂性、非线性以及纠缠性,从DNA结、RNA二级结构、蛋白质折叠构型、染色体、DNA折纸、分子组装到其他大分子水平的情况,在其分析和表征方面构成了严峻挑战。在过去几十年中,来自代数拓扑、组合拓扑、计算拓扑和拓扑数据分析的数学概念、模型、算法和工具,已展现出强大威力,并开始在应对生物分子数据挑战中发挥重要作用。在这项工作中,我们引入生物分子拓扑学,它涉及源自生物分子系统的拓扑问题和模型。更具体地说,生物分子拓扑学涵盖从生物分子结构、动力学、相互作用和功能中涌现出的拓扑结构、性质和关系。我们从(蛋白质、DNA和RNA的)结构、蛋白质折叠和蛋白质组装等方面讨论了生物分子拓扑学的各种类型。还简要讨论了数据库、理论模型和计算算法。此外,我们系统地回顾了相关的拓扑模型,包括图、单纯复形、持久同调、持久拉普拉斯算子、德拉姆 - 霍奇理论、丘 - 豪斯多夫距离以及基于拓扑的机器学习模型。