Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore.
Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China.
Methods Mol Biol. 2023;2627:211-229. doi: 10.1007/978-1-0716-2974-1_12.
Molecular representations are of great importance for machine learning models in RNA data analysis. Essentially, efficient molecular descriptors or fingerprints that characterize the intrinsic structural and interactional information of RNAs can significantly boost the performance of all learning modeling. In this paper, we introduce two persistent models, including persistent homology and persistent spectral, for RNA structure and interaction representations and their applications in RNA data analysis. Different from traditional geometric and graph representations, persistent homology is built on simplicial complex, which is a generalization of graph models to higher-dimensional situations. Hypergraph is a further generalization of simplicial complexes and hypergraph-based embedded persistent homology has been proposed recently. Moreover, persistent spectral models, which combine filtration process with spectral models, including spectral graph, spectral simplicial complex, and spectral hypergraph, are proposed for molecular representation. The persistent attributes for RNAs can be obtained from these two persistent models and further combined with machine learning models for RNA structure, flexibility, dynamics, and function analysis.
分子表示对于 RNA 数据分析中的机器学习模型非常重要。本质上,能够刻画 RNA 内在结构和相互作用信息的有效分子描述符或指纹,可以显著提升所有学习建模的性能。在本文中,我们介绍了两种用于 RNA 结构和相互作用表示的持久模型,包括持久同调与持久谱,并讨论了它们在 RNA 数据分析中的应用。与传统的几何和图表示方法不同,持久同调建立在单纯复形上,它是图模型在高维情况下的推广。超图是单纯复形的进一步推广,最近已经提出了基于超图的嵌入持久同调。此外,我们还提出了结合过滤过程与谱模型的持久谱模型,包括谱图、谱单纯复形和谱超图,用于分子表示。可以从这两种持久模型中获取 RNA 的持久属性,并进一步与机器学习模型相结合,用于 RNA 结构、灵活性、动力学和功能分析。