Department of Physics and Materials Science, University of Luxembourg, L-1511, Luxembourg City, Luxembourg.
Institute for Materials Science and Max Bergmann Center of Biomaterials, TU Dresden, 01062, Dresden, Germany.
Sci Data. 2024 Jul 7;11(1):742. doi: 10.1038/s41597-024-03521-8.
We here introduce the Aquamarine (AQM) dataset, an extensive quantum-mechanical (QM) dataset that contains the structural and electronic information of 59,783 low-and high-energy conformers of 1,653 molecules with a total number of atoms ranging from 2 to 92 (mean: 50.9), and containing up to 54 (mean: 28.2) non-hydrogen atoms. To gain insights into the solvent effects as well as collective dispersion interactions for drug-like molecules, we have performed QM calculations supplemented with a treatment of many-body dispersion (MBD) interactions of structures and properties in the gas phase and implicit water. Thus, AQM contains over 40 global and local physicochemical properties (including ground-state and response properties) per conformer computed at the tightly converged PBE0+MBD level of theory for gas-phase molecules, whereas PBE0+MBD with the modified Poisson-Boltzmann (MPB) model of water was used for solvated molecules. By addressing both molecule-solvent and dispersion interactions, AQM dataset can serve as a challenging benchmark for state-of-the-art machine learning methods for property modeling and de novo generation of large (solvated) molecules with pharmaceutical and biological relevance.
我们在此介绍 Aquamarine (AQM) 数据集,这是一个广泛的量子力学 (QM) 数据集,其中包含了 1653 种分子的 59783 个低能和高能构象的结构和电子信息,这些分子的原子总数从 2 到 92(平均值:50.9)不等,其中最多包含 54 个(平均值:28.2)非氢原子。为了深入了解药物样分子的溶剂效应和集体色散相互作用,我们对结构和性质进行了量子力学计算,并补充了许多体色散(MBD)相互作用的处理,这些计算在气相和隐式水中进行。因此,AQM 包含了超过 40 种全局和局部物理化学性质(包括基态和响应性质),每个构象都是在气相分子的紧密收敛的 PBE0+MBD 理论水平上计算的,而对于溶剂化分子,则使用了改进的泊松-玻尔兹曼(MPB)模型的 PBE0+MBD。通过解决分子-溶剂和色散相互作用的问题,AQM 数据集可以作为具有药物和生物学相关性的大型(溶剂化)分子的性质建模和从头生成的最先进机器学习方法的挑战性基准。