Suppr超能文献

使用图神经网络对羧酸和烷基胺进行构象依赖性DFT水平描述符的快速预测。

Rapid prediction of conformationally-dependent DFT-level descriptors using graph neural networks for carboxylic acids and alkyl amines.

作者信息

Haas Brittany C, Hardy Melissa A, Sowndarya S V Shree, Adams Keir, Coley Connor W, Paton Robert S, Sigman Matthew S

机构信息

Department of Chemistry, University of Utah Salt Lake City Utah 84112 USA

Department of Chemistry, Colorado State University Fort Collins Colorado 80523 USA

出版信息

Digit Discov. 2024 Nov 28;4(1):222-233. doi: 10.1039/d4dd00284a. eCollection 2025 Jan 15.

Abstract

Data-driven reaction discovery and development is a growing field that relies on the use of molecular descriptors to capture key information about substrates, ligands, and targets. Broad adaptation of this strategy is hindered by the associated computational cost of descriptor calculation, especially when considering conformational flexibility. Descriptor libraries can be precomputed agnostic of application to reduce the computational burden of data-driven reaction development. However, as one often applies these models to evaluate novel hypothetical structures, it would be ideal to predict the descriptors of compounds on-the-fly. Herein, we report DFT-level descriptor libraries for conformational ensembles of 8528 carboxylic acids and 8172 alkyl amines towards this goal. Employing 2D and 3D graph neural network architectures trained on these libraries culminated in the development of predictive models for molecule-level descriptors, as well as the bond- and atom-level descriptors for the conserved reactive site (carboxylic acid or amine). The predictions were confirmed to be robust for an external validation set of medicinally-relevant carboxylic acids and alkyl amines. Additionally, a retrospective study correlating the rate of amide coupling reactions demonstrated the suitability of the predicted DFT-level descriptors for downstream applications. Ultimately, these models enable high-fidelity predictions for a vast number of potential substrates, greatly increasing accessibility to the field of data-driven reaction development.

摘要

数据驱动的反应发现与开发是一个不断发展的领域,它依赖于使用分子描述符来获取有关底物、配体和靶点的关键信息。这种策略的广泛应用受到描述符计算相关计算成本的阻碍,尤其是在考虑构象灵活性时。描述符库可以在不考虑应用的情况下预先计算,以减轻数据驱动反应开发的计算负担。然而,由于人们经常应用这些模型来评估新型假设结构,能够即时预测化合物的描述符将是理想的。在此,我们报告了针对8528种羧酸和8172种烷基胺的构象集合的密度泛函理论(DFT)水平描述符库,以实现这一目标。利用在这些库上训练的二维和三维图神经网络架构,最终开发出了分子水平描述符以及保守反应位点(羧酸或胺)的键级和原子级描述符的预测模型。对于一组与药物相关的羧酸和烷基胺的外部验证集,预测结果被证实是可靠的。此外,一项将酰胺偶联反应速率相关联的回顾性研究表明,预测的DFT水平描述符适用于下游应用。最终,这些模型能够对大量潜在底物进行高保真预测,大大增加了数据驱动反应开发领域的可及性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba37/11626426/518c7f9448df/d4dd00284a-f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验