Cavendish Laboratory, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK.
ISIS Neutron and Muon Source, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0QX, UK.
Sci Data. 2024 Jan 17;11(1):80. doi: 10.1038/s41597-023-02897-3.
A database of thermally activated delayed fluorescent (TADF) molecules was automatically generated from the scientific literature. It consists of 25,482 data records with an overall precision of 82%. Among these, 5,349 records have chemical names in the form of SMILES strings which are represented with 91% accuracy; these are grouped in a subsidiary database. Each data record contains one of the following four properties: maximum emission wavelength (λ), photoluminescence quantum yield (PLQY), singlet-triplet energy splitting (ΔE), and delayed lifetime (τ). The databases were created through text mining using ChemDataExtractor, a chemistry-aware natural-language-processing toolkit, which has been adapted for TADF research. The text-mined corpus consisted of 2,733 papers from the Royal Society of Chemistry and Elsevier. To the best of our knowledge, these databases are the first databases that have been auto-generated for TADF molecules from existing publications. The databases have been publicly released for experimental and computational applications in the TADF research field.
从科学文献中自动生成了一个热激活延迟荧光 (TADF) 分子数据库。它由 25482 条数据记录组成,整体精度为 82%。其中,5349 条记录具有 SMILES 字符串形式的化学名称,其表示准确度为 91%;这些记录被分组到一个附属数据库中。每个数据记录包含以下四个属性之一:最大发射波长 (λ)、光致发光量子产率 (PLQY)、单重态-三重态能量分裂 (ΔE) 和延迟寿命 (τ)。这些数据库是通过使用 ChemDataExtractor(一种具有化学意识的自然语言处理工具包)进行文本挖掘创建的,该工具包已针对 TADF 研究进行了改编。文本挖掘语料库由来自皇家化学学会和爱思唯尔的 2733 篇论文组成。据我们所知,这些数据库是第一个从现有出版物中自动生成 TADF 分子的数据库。这些数据库已公开发布,可供 TADF 研究领域的实验和计算应用使用。