Department of Chemistry, Institute of Physical Chemistry, University of Basel , Klingelbergstrasse 80, CH-4056 Basel, Switzerland.
Max-Planck-Institut für Kohlenforschung, Kaiser-Wilhelm-Platz 1 , 45470 Mülheim an der Ruhr, Germany ; Computer-Chemie-Centrum, University of Erlangen-Nuremberg , Nägelsbachstr. 25, 91052 Erlangen, Germany.
Sci Data. 2014 Aug 5;1:140022. doi: 10.1038/sdata.2014.22. eCollection 2014.
Computational de novo design of new drugs and materials requires rigorous and unbiased exploration of chemical compound space. However, large uncharted territories persist due to its size scaling combinatorially with molecular size. We report computed geometric, energetic, electronic, and thermodynamic properties for 134k stable small organic molecules made up of CHONF. These molecules correspond to the subset of all 133,885 species with up to nine heavy atoms (CONF) out of the GDB-17 chemical universe of 166 billion organic molecules. We report geometries minimal in energy, corresponding harmonic frequencies, dipole moments, polarizabilities, along with energies, enthalpies, and free energies of atomization. All properties were calculated at the B3LYP/6-31G(2df,p) level of quantum chemistry. Furthermore, for the predominant stoichiometry, C7H10O2, there are 6,095 constitutional isomers among the 134k molecules. We report energies, enthalpies, and free energies of atomization at the more accurate G4MP2 level of theory for all of them. As such, this data set provides quantum chemical properties for a relevant, consistent, and comprehensive chemical space of small organic molecules. This database may serve the benchmarking of existing methods, development of new methods, such as hybrid quantum mechanics/machine learning, and systematic identification of structure-property relationships.
计算从头设计新药和材料需要对化学化合物空间进行严格和无偏的探索。然而,由于其大小与分子大小呈组合级数缩放,因此仍然存在大片未开发的领域。我们报告了由 CHONF 组成的 134k 个稳定小分子的计算几何、能量、电子和热力学性质。这些分子对应于 GDB-17 化学宇宙中 1660 亿个有机分子中最多含有 9 个重原子(CONF)的所有 133885 种物质的子集。我们报告了能量最小的几何形状,相应的谐振动频率、偶极矩、极化率,以及能量、焓和原子化自由能。所有性质均在量子化学 B3LYP/6-31G(2df,p)水平上进行了计算。此外,对于主要的化学计量比 C7H10O2,在这 134k 个分子中,有 6095 个构象异构体。我们报告了所有这些构象异构体在更精确的 G4MP2 理论水平上的原子化能量、焓和自由能。因此,该数据集为小分子相关、一致和全面的化学空间提供了量子化学性质。该数据库可用于基准测试现有方法、开发新方法(如混合量子力学/机器学习)以及系统地识别结构-性质关系。