Suppr超能文献

基于数据增强的神经网络方法预测化学品的熔点

Prediction of Melting Points of Chemicals with a Data Augmentation-Based Neural Network Approach.

作者信息

Austermeier Lea E, Voigt Karsten, Böhme Alexander, Ulrich Nadin

机构信息

Department of Exposure Science, Helmholtz Centre for Environmental ResearchUFZ, Permoserstrasse 15, Leipzig D-04318, Germany.

PAULY, Theresienstrasse 50, Leipzig D-04129, Germany.

出版信息

ACS Omega. 2025 Jun 3;10(23):24296-24306. doi: 10.1021/acsomega.5c00205. eCollection 2025 Jun 17.

Abstract

The melting point (MP) of a chemical is an important physicochemical property that characterizes the transition from a solid to a liquid state. The MP is a key parameter in molecular design and relevant in many fields such as drug design and environmental science. Therefore, an accurate prediction of the MP is of huge interest. Here, we develop two graph convolutional neural network (GNN) models for the prediction of the MP: one where we do not apply a data augmentation strategy and one where we apply a data augmentation strategy. The models were developed on a data set containing 28,645 chemicals, where we removed duplicates and data points labeled as faulty. Then we split the data set into training, validation, and test sets. The model was trained on this initial data set and on a higher curated data set. Based on the data augmentation, we could enlarge the number of neurons in each of the two hidden layers in the GNN and reinforce the representation of large and complex molecules. We compared the influence of the curation step and the data augmentation and found that the curation step had no significant influence on the model performance, while the model could be improved by the application of data augmentation. With a consensus model, we achieved an rmse of 35.4 °C.

摘要

化学品的熔点(MP)是一项重要的物理化学性质,它表征了从固态到液态的转变。熔点是分子设计中的关键参数,在药物设计和环境科学等许多领域都具有相关性。因此,准确预测熔点具有极大的研究价值。在此,我们开发了两种用于预测熔点的图卷积神经网络(GNN)模型:一种未应用数据增强策略,另一种应用了数据增强策略。这些模型是基于一个包含28645种化学品的数据集开发的,我们去除了重复数据点以及标记为有缺陷的数据点。然后,我们将数据集划分为训练集、验证集和测试集。该模型在这个初始数据集以及一个经过更高质量筛选的数据集上进行训练。基于数据增强,我们能够增加GNN中两个隐藏层各自的神经元数量,并强化大型和复杂分子的表示。我们比较了数据筛选步骤和数据增强的影响,发现数据筛选步骤对模型性能没有显著影响,而应用数据增强可以提升模型性能。通过一个共识模型,我们实现了35.4°C的均方根误差(RMSE)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/afde/12177604/1e028b1d573d/ao5c00205_0001.jpg

相似文献

1
Prediction of Melting Points of Chemicals with a Data Augmentation-Based Neural Network Approach.
ACS Omega. 2025 Jun 3;10(23):24296-24306. doi: 10.1021/acsomega.5c00205. eCollection 2025 Jun 17.
2
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
3
Algorithm-based pain management for people with dementia in nursing homes.
Cochrane Database Syst Rev. 2022 Apr 1;4(4):CD013339. doi: 10.1002/14651858.CD013339.pub2.
4
Sertindole for schizophrenia.
Cochrane Database Syst Rev. 2005 Jul 20;2005(3):CD001715. doi: 10.1002/14651858.CD001715.pub2.
5
Eliciting adverse effects data from participants in clinical trials.
Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.
6
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
7
Topical antiseptics for chronic suppurative otitis media.
Cochrane Database Syst Rev. 2025 Jun 9;6(6):CD013055. doi: 10.1002/14651858.CD013055.pub3.
8
Atypical antipsychotics for disruptive behaviour disorders in children and youths.
Cochrane Database Syst Rev. 2017 Aug 9;8(8):CD008559. doi: 10.1002/14651858.CD008559.pub3.
10
Impact of residual disease as a prognostic factor for survival in women with advanced epithelial ovarian cancer after primary surgery.
Cochrane Database Syst Rev. 2022 Sep 26;9(9):CD015048. doi: 10.1002/14651858.CD015048.pub2.

本文引用的文献

2
A review on machine learning algorithms for the ionic liquid chemical space.
Chem Sci. 2021 May 6;12(20):6820-6843. doi: 10.1039/d1sc01000j.
3
Deep Learning in Chemistry.
J Chem Inf Model. 2019 Jun 24;59(6):2545-2559. doi: 10.1021/acs.jcim.9b00266. Epub 2019 Jun 13.
4
MoleculeNet: a benchmark for molecular machine learning.
Chem Sci. 2017 Oct 31;9(2):513-530. doi: 10.1039/c7sc02664a. eCollection 2018 Jan 14.
5
Estimation of Melting Points of Organics.
J Pharm Sci. 2018 May;107(5):1211-1227. doi: 10.1016/j.xphs.2017.12.013. Epub 2017 Dec 22.
7
Toward Fully in Silico Melting Point Prediction Using Molecular Simulations.
J Chem Theory Comput. 2013 Mar 12;9(3):1592-9. doi: 10.1021/ct301095j. Epub 2013 Feb 19.
8
How accurately can we predict the melting points of drug-like compounds?
J Chem Inf Model. 2014 Dec 22;54(12):3320-9. doi: 10.1021/ci5005288. Epub 2014 Dec 9.
9
Is universal, simple melting point prediction possible?
Chemphyschem. 2011 Nov 18;12(16):2959-72. doi: 10.1002/cphc.201100522. Epub 2011 Sep 29.
10
Melting point prediction employing k-nearest neighbor algorithms and genetic parameter optimization.
J Chem Inf Model. 2006 Nov-Dec;46(6):2412-22. doi: 10.1021/ci060149f.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验