Suppr超能文献

生物分子数据的数学表示方法综述。

A review of mathematical representations of biomolecular data.

机构信息

Department of Mathematics, Michigan State University, MI 48824, USA.

出版信息

Phys Chem Chem Phys. 2020 Feb 26;22(8):4343-4367. doi: 10.1039/c9cp06554g.

Abstract

Recently, machine learning (ML) has established itself in various worldwide benchmarking competitions in computational biology, including Critical Assessment of Structure Prediction (CASP) and Drug Design Data Resource (D3R) Grand Challenges. However, the intricate structural complexity and high ML dimensionality of biomolecular datasets obstruct the efficient application of ML algorithms in the field. In addition to data and algorithm, an efficient ML machinery for biomolecular predictions must include structural representation as an indispensable component. Mathematical representations that simplify the biomolecular structural complexity and reduce ML dimensionality have emerged as a prime winner in D3R Grand Challenges. This review is devoted to the recent advances in developing low-dimensional and scalable mathematical representations of biomolecules in our laboratory. We discuss three classes of mathematical approaches, including algebraic topology, differential geometry, and graph theory. We elucidate how the physical and biological challenges have guided the evolution and development of these mathematical apparatuses for massive and diverse biomolecular data. We focus the performance analysis on protein-ligand binding predictions in this review although these methods have had tremendous success in many other applications, such as protein classification, virtual screening, and the predictions of solubility, solvation free energies, toxicity, partition coefficients, protein folding stability changes upon mutation, etc.

摘要

最近,机器学习(ML)已经在计算生物学的各种全球基准竞赛中确立了自己的地位,包括结构预测的关键评估(CASP)和药物设计数据资源(D3R)大挑战。然而,生物分子数据集的复杂结构复杂性和高 ML 维度阻碍了 ML 算法在该领域的有效应用。除了数据和算法外,用于生物分子预测的高效 ML 机制还必须将结构表示作为不可或缺的组成部分。简化生物分子结构复杂性并降低 ML 维度的数学表示已成为 D3R 大挑战中的主要赢家。这篇综述致力于在我们实验室中开发生物分子的低维且可扩展的数学表示形式的最新进展。我们讨论了三类数学方法,包括代数拓扑,微分几何和图论。我们阐明了物理和生物挑战如何指导这些数学工具的发展,以适应大规模和多样化的生物分子数据。尽管这些方法在许多其他应用中取得了巨大成功,例如蛋白质分类,虚拟筛选以及对溶解度,溶剂化自由能,毒性,分配系数,突变后蛋白质折叠稳定性变化的预测等,但我们在本文的综述中重点分析了蛋白质-配体结合预测的性能。

相似文献

1
A review of mathematical representations of biomolecular data.生物分子数据的数学表示方法综述。
Phys Chem Chem Phys. 2020 Feb 26;22(8):4343-4367. doi: 10.1039/c9cp06554g.
3
Biomolecular Topology: Modelling and Analysis.生物分子拓扑学:建模与分析
Acta Math Sin Engl Ser. 2022;38(10):1901-1938. doi: 10.1007/s10114-022-2326-5. Epub 2022 Oct 15.
4
MathDL: mathematical deep learning for D3R Grand Challenge 4.MathDL:用于 D3R 大挑战 4 的数学深度学习。
J Comput Aided Mol Des. 2020 Feb;34(2):131-147. doi: 10.1007/s10822-019-00237-5. Epub 2019 Nov 16.
6
DG-GL: Differential geometry-based geometric learning of molecular datasets.基于微分几何的分子数据集的几何学习。
Int J Numer Method Biomed Eng. 2019 Mar;35(3):e3179. doi: 10.1002/cnm.3179. Epub 2019 Feb 7.

引用本文的文献

7
Topological Learning Approach to Characterizing Biological Membranes.拓扑学习方法在生物膜特征描述中的应用。
J Chem Inf Model. 2024 Jul 8;64(13):5242-5252. doi: 10.1021/acs.jcim.4c00552. Epub 2024 Jun 24.

本文引用的文献

1
Generative network complex (GNC) for drug discovery.用于药物发现的生成网络复合体(GNC)
Commun Inf Syst. 2019;19(3):241-277. doi: 10.4310/cis.2019.v19.n3.a2.
2
MathDL: mathematical deep learning for D3R Grand Challenge 4.MathDL:用于 D3R 大挑战 4 的数学深度学习。
J Comput Aided Mol Des. 2020 Feb;34(2):131-147. doi: 10.1007/s10822-019-00237-5. Epub 2019 Nov 16.
5
AlphaFold at CASP13.AlphaFold 在 CASP13 中的应用。
Bioinformatics. 2019 Nov 1;35(22):4862-4865. doi: 10.1093/bioinformatics/btz422.
7
DG-GL: Differential geometry-based geometric learning of molecular datasets.基于微分几何的分子数据集的几何学习。
Int J Numer Method Biomed Eng. 2019 Mar;35(3):e3179. doi: 10.1002/cnm.3179. Epub 2019 Feb 7.
9
Comparative Assessment of Scoring Functions: The CASF-2016 Update.评分函数的比较评估:CASF-2016 更新。
J Chem Inf Model. 2019 Feb 25;59(2):895-913. doi: 10.1021/acs.jcim.8b00545. Epub 2018 Dec 11.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验