Department of Mathematics, Michigan State University, East Lansing, 48824, Michigan.
Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA.
Int J Numer Method Biomed Eng. 2019 Mar;35(3):e3179. doi: 10.1002/cnm.3179. Epub 2019 Feb 7.
Despite its great success in various physical modeling, differential geometry (DG) has rarely been devised as a versatile tool for analyzing large, diverse, and complex molecular and biomolecular datasets because of the limited understanding of its potential power in dimensionality reduction and its ability to encode essential chemical and biological information in differentiable manifolds.
We put forward a differential geometry-based geometric learning (DG-GL) hypothesis that the intrinsic physics of three-dimensional (3D) molecular structures lies on a family of low-dimensional manifolds embedded in a high-dimensional data space. We encode crucial chemical, physical, and biological information into 2D element interactive manifolds, extracted from a high-dimensional structural data space via a multiscale discrete-to-continuum mapping using differentiable density estimators. Differential geometry apparatuses are utilized to construct element interactive curvatures in analytical forms for certain analytically differentiable density estimators. These low-dimensional differential geometry representations are paired with a robust machine learning algorithm to showcase their descriptive and predictive powers for large, diverse, and complex molecular and biomolecular datasets. Extensive numerical experiments are carried out to demonstrate that the proposed DG-GL strategy outperforms other advanced methods in the predictions of drug discovery-related protein-ligand binding affinity, drug toxicity, and molecular solvation free energy.
http://weilab.math.msu.edu/DG-GL/ Contact: wei@math.msu.edu.
尽管微分几何(DG)在各种物理建模中取得了巨大的成功,但由于对其在降维和编码重要化学和生物学信息方面的潜在能力的理解有限,它很少被设计为分析大型、多样化和复杂的分子和生物分子数据集的通用工具。
我们提出了一个基于微分几何的几何学习(DG-GL)假设,即三维(3D)分子结构的内在物理性质位于嵌入在高维数据空间中的低维流形族上。我们通过使用可微密度估计器的多尺度离散到连续映射,将关键的化学、物理和生物学信息编码到从高维结构数据空间提取的 2D 元素交互流形中。微分几何仪器用于构建某些解析可微密度估计器的解析形式的元素交互曲率。这些低维微分几何表示与强大的机器学习算法相结合,展示了它们在大型、多样化和复杂的分子和生物分子数据集的描述和预测能力。进行了广泛的数值实验,以证明所提出的 DG-GL 策略在预测与药物发现相关的蛋白-配体结合亲和力、药物毒性和分子溶剂化自由能方面优于其他先进方法。