Suppr超能文献

HDBind:采用超维二进制表示法对分子结构进行编码。

HDBind: encoding of molecular structure with hyperdimensional binary representations.

作者信息

Jones Derek, Zhang Xiaohua, Bennion Brian J, Pinge Sumukh, Xu Weihong, Kang Jaeyoung, Khaleghi Behnam, Moshiri Niema, Allen Jonathan E, Rosing Tajana S

机构信息

Department of Computer Science and Engineering, University of California-San Diego, La Jolla, CA, USA.

Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, CA, USA.

出版信息

Sci Rep. 2024 Nov 23;14(1):29025. doi: 10.1038/s41598-024-80009-w.

Abstract

Traditional methods for identifying "hit" molecules from a large collection of potential drug-like candidates rely on biophysical theory to compute approximations to the Gibbs free energy of the binding interaction between the drug and its protein target. These approaches have a significant limitation in that they require exceptional computing capabilities for even relatively small collections of molecules. Increasingly large and complex state-of-the-art deep learning approaches have gained popularity with the promise to improve the productivity of drug design, notorious for its numerous failures. However, as deep learning models increase in their size and complexity, their acceleration at the hardware level becomes more challenging. Hyperdimensional Computing (HDC) has recently gained attention in the computer hardware community due to its algorithmic simplicity relative to deep learning approaches. The HDC learning paradigm, which represents data with high-dimension binary vectors, allows the use of low-precision binary vector arithmetic to create models of the data that can be learned without the need for the gradient-based optimization required in many conventional machine learning and deep learning methods. This algorithmic simplicity allows for acceleration in hardware that has been previously demonstrated in a range of application areas (computer vision, bioinformatics, mass spectrometery, remote sensing, edge devices, etc.). To the best of our knowledge, our work is the first to consider HDC for the task of fast and efficient screening of modern drug-like compound libraries. We also propose the first HDC graph-based encoding methods for molecular data, demonstrating consistent and substantial improvement over previous work. We compare our approaches to alternative approaches on the well-studied MoleculeNet dataset and the recently proposed LIT-PCBA dataset derived from high quality PubChem assays. We demonstrate our methods on multiple target hardware platforms, including Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs), showing at least an order of magnitude improvement in energy efficiency versus even our smallest neural network baseline model with a single hidden layer. Our work thus motivates further investigation into molecular representation learning to develop ultra-efficient pre-screening tools. We make our code publicly available at https://github.com/LLNL/hdbind .

摘要

从大量潜在的类药物候选物中识别“命中”分子的传统方法依赖生物物理理论来计算药物与其蛋白质靶点之间结合相互作用的吉布斯自由能近似值。这些方法有一个显著的局限性,即即使对于相对较小的分子集合,它们也需要卓越的计算能力。规模越来越大且日益复杂的先进深度学习方法因有望提高药物设计的生产率而受到欢迎,而药物设计一直以失败众多而声名狼藉。然而,随着深度学习模型规模和复杂度的增加,其在硬件层面的加速变得更具挑战性。超维计算(HDC)最近在计算机硬件领域受到关注,因为相对于深度学习方法,它的算法更简单。HDC学习范式用高维二进制向量表示数据,允许使用低精度二进制向量算法来创建数据模型,这些模型无需许多传统机器学习和深度学习方法所需的基于梯度的优化就能学习。这种算法简单性使得在硬件上能够实现加速,这一点已在一系列应用领域(计算机视觉、生物信息学、质谱分析、遥感、边缘设备等)得到证明。据我们所知,我们的工作是首次将HDC用于快速高效筛选现代类药物化合物库的任务。我们还提出了第一种基于HDC图的分子数据编码方法,与之前工作相比展现出持续且显著的改进。我们在经过充分研究的MoleculeNet数据集以及最近从高质量PubChem分析中得出的LIT - PCBA数据集上,将我们的方法与其他方法进行比较。我们在包括图形处理单元(GPU)和现场可编程门阵列(FPGA)在内的多个目标硬件平台上展示了我们的方法,结果表明,即使与我们具有单个隐藏层的最小神经网络基线模型相比,能源效率至少提高了一个数量级。因此,我们的工作促使人们进一步研究分子表示学习,以开发超高效的预筛选工具。我们将代码公开在https://github.com/LLNL/hdbind上。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/31a2/11584749/bcfef2a2ca25/41598_2024_80009_Figa_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验