Oak Ridge National Laboratory, Computational Sciences and Engineering Division, Oak Ridge, 37831, USA.
Oak Ridge National Laboratory, Computer Science and Mathematics Division, Oak Ridge, 37831, USA.
Sci Data. 2023 Aug 21;10(1):546. doi: 10.1038/s41597-023-02408-4.
We present two open-source datasets that provide time-dependent density-functional tight-binding (TD-DFTB) electronic excitation spectra of organic molecules. These datasets represent predictions of UV-vis absorption spectra performed on optimized geometries of the molecules in their electronic ground state. The GDB-9-Ex dataset contains a subset of 96,766 organic molecules from the original open-source GDB-9 dataset. The ORNL_AISD-Ex dataset consists of 10,502,904 organic molecules that contain between 5 and 71 non-hydrogen atoms. The data reveals the close correlation between the magnitude of the gaps between the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO), and the excitation energy of the lowest singlet excited state energies quantitatively. The chemical variability of the large number of molecules was examined with a topological fingerprint estimation based on extended-connectivity fingerprints (ECFPs) followed by uniform manifold approximation and projection (UMAP) for dimension reduction. Both datasets were generated using the DFTB+ software on the "Andes" cluster of the Oak Ridge Leadership Computing Facility (OLCF).
我们提供了两个开源数据集,这些数据集提供了有机分子的时变密度泛函紧束缚(TD-DFTB)电子激发光谱。这些数据集代表了在分子的电子基态优化几何结构上进行的 UV-vis 吸收光谱的预测。GDB-9-Ex 数据集包含原始开源 GDB-9 数据集中的 96766 个有机分子的子集。ORNL_AISD-Ex 数据集由 10502904 个有机分子组成,这些分子包含 5 到 71 个非氢原子。数据定量揭示了最高占据分子轨道(HOMO)和最低未占据分子轨道(LUMO)之间的间隙大小与最低单重激发态能量之间的密切相关性。通过基于扩展连接指纹(ECFPs)的拓扑指纹估计以及用于降维的一致流形逼近和投影(UMAP),对大量分子的化学变异性进行了检查。这两个数据集都是使用 DFTB+软件在橡树岭领导力计算设施(OLCF)的“安第斯”集群上生成的。