Suppr超能文献

基于量子描述符的环氨基磺酸酯热危险性机器学习建模

Quantum Descriptor-Based Machine-Learning Modeling of Thermal Hazard of Cyclic Sulfamidates.

作者信息

Dabros Michal, Münkler Hagen, Yerly Florence, Marti Roger, Parmentier Michaël, Udvarhelyi Anikó

机构信息

Institute of Chemical Technology, Haute école d'ingénierie et d'architecture de Fribourg, HES-SO University of Applied Sciences and Arts Western Switzerland, CH-1700 Fribourg, Switzerland.

Pharmaceutical and Analytical Development, Novartis Pharma AG, CH-4056 Basel, Switzerland.

出版信息

J Chem Inf Model. 2025 Aug 25;65(16):8624-8636. doi: 10.1021/acs.jcim.5c01048. Epub 2025 Aug 15.

Abstract

Cyclic sulfamidates are commonly used building blocks in organic synthesis. Correct classification of their thermal criticality is crucial for the safe use of these compounds in process development and scale-up. In this study, building on our earlier work (Ferrari et al., 2022), we focused on modeling the reaction enthalpy of a family of 5-membered cyclic sulfamidates toward strong bases. The key challenge for the modeling task was the sparse availability of measured reaction enthalpies, with only 29 measurements available. To address this challenge, we used descriptors based on the quantum-chemical properties of the molecules, as they are more closely related to reaction enthalpies than typical cheminformatics-based descriptors. This approach allowed us to avoid relying solely on data-to-fit models and to focus instead on modeling reaction enthalpies using chemistry-aware techniques, which are more appropriate for small data sets. Three models were constructed using the quantum-chemical descriptors: the first one combining Partial Least Squares (PLS) regression with a Genetic Algorithm (GA), the second one based on the Least Absolute Shrinkage and Selection Operator (LASSO) method, and last, a Gaussian Process Regression (GPR) model. The three models achieved coefficients of determination of 0.78, 0.67, and 0.74, respectively. Although the absolute prediction error values were close to 100 J/g, it is noteworthy that all three techniques provided similar results and accurately classified nearly all compounds into their respective thermal criticality classes. This highlights the methodology's effectiveness in providing a reliable framework for preliminary safety assessment and decision-making in process development.

摘要

环状氨基磺酸酯是有机合成中常用的结构单元。正确分类它们的热危险性对于在工艺开发和放大过程中安全使用这些化合物至关重要。在本研究中,基于我们早期的工作(法拉利等人,2022年),我们专注于对一族五元环状氨基磺酸酯与强碱反应的反应焓进行建模。建模任务的关键挑战是实测反应焓的可用性稀少,仅有29次测量数据。为应对这一挑战,我们使用了基于分子量子化学性质的描述符,因为它们比典型的基于化学信息学的描述符与反应焓的关系更为密切。这种方法使我们能够避免仅依赖数据拟合模型,而是专注于使用化学感知技术对反应焓进行建模,这更适合小数据集。使用量子化学描述符构建了三个模型:第一个模型将偏最小二乘法(PLS)回归与遗传算法(GA)相结合,第二个模型基于最小绝对收缩和选择算子(LASSO)方法,最后一个是高斯过程回归(GPR)模型。这三个模型的决定系数分别为0.78、0.67和0.74。尽管绝对预测误差值接近100 J/g,但值得注意的是,所有这三种技术都提供了相似的结果,并将几乎所有化合物准确地分类到各自的热危险性类别中。这突出了该方法在为工艺开发中的初步安全评估和决策提供可靠框架方面的有效性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验