Yuan Mingzhi, Zou Zihan, Luo Yi, Jiang Jun, Hu Wei
School of Chemistry and Chemical Engineering, Qilu University of Technology (Shandong Academy of Science), Jinan 250353, China.
Hefei National Research Center for Physical Sciences at the Microscale, University of Science and Technology of China, 230026 Hefei, China.
J Phys Chem Lett. 2025 Apr 24;16(16):3972-3979. doi: 10.1021/acs.jpclett.5c00839. Epub 2025 Apr 13.
Developing machine learning protocols for molecular simulations requires comprehensive and efficient data sets. Here we introduce the QMe14S data set, comprising 186,102 small organic molecules featuring 14 elements (H, B, C, N, O, F, Al, Si, P, S, Cl, As, Se, and Br) and 47 functional groups. Using density functional theory at the B3LYP/TZVP level, we optimized the geometries and calculated properties, including energy, atomic charge, atomic force, dipole moment, quadrupole moment, polarizability, octupole moment, first hyperpolarizability, and Hessian. At the same level, we obtained the harmonic IR, Raman, and NMR spectra. Furthermore, we conducted ab initio molecular dynamics simulations to generate dynamic configurations and extract nonequilibrium properties, including energy, forces, and Hessians. By leveraging our E(3)-equivariant message-passing neural network (DetaNet), we demonstrated that models trained on QMe14S outperform those trained on the previously developed QM9S data set in simulating molecular spectra. The QMe14S data set thus serves as a comprehensive benchmark for molecular simulations, offering valuable insights into structure-property relationships.
开发用于分子模拟的机器学习协议需要全面且高效的数据集。在此,我们引入了QMe14S数据集,它包含186,102个小有机分子,这些分子含有14种元素(H、B、C、N、O、F、Al、Si、P、S、Cl、As、Se和Br)以及47个官能团。我们使用B3LYP/TZVP水平的密度泛函理论,优化了几何结构并计算了各种性质,包括能量、原子电荷、原子力、偶极矩、四极矩、极化率、八极矩、第一超极化率以及海森矩阵。在同一水平上,我们还获得了谐波红外光谱、拉曼光谱和核磁共振光谱。此外,我们进行了从头算分子动力学模拟以生成动态构型并提取非平衡性质,包括能量、力和海森矩阵。通过利用我们的E(3)等变消息传递神经网络(DetaNet),我们证明了在QMe14S上训练的模型在模拟分子光谱方面优于在先前开发的QM9S数据集上训练的模型。因此,QMe14S数据集可作为分子模拟的综合基准,为结构 - 性质关系提供有价值的见解。