Cavendish Laboratory, University of Cambridge, J. J. Thomson Avenue, Cambridge, CB3 0HE, UK.
ISIS Neutron and Muon Source, Rutherford Appleton Laboratory, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0QX, UK.
Sci Data. 2022 May 3;9(1):192. doi: 10.1038/s41597-022-01295-5.
The ability to auto-generate databases of optical properties holds great potential for advancing optical research, especially with regards to the data-driven discovery of optical materials. An optical property database of refractive indices and dielectric constants is presented, which comprises a total of 49,076 refractive index and 60,804 dielectric constant data records on 11,054 unique chemicals. The database was auto-generated using the state-of-the-art natural language processing software, ChemDataExtractor, using a corpus of 388,461 scientific papers. The data repository offers a representative overview of the information on linear optical properties that resides in scientific papers from the past 30 years. Public availability of these data will enable a quick search for the optical property of certain materials. The large size of this repository will accelerate data-driven research on the design and prediction of optical materials and their properties. To the best of our knowledge, this is the first auto-generated database of optical properties from a large number of scientific papers. We provide a web interface to aid the use of our database.
自动生成光学性质数据库在推动光学研究方面具有巨大的潜力,特别是在数据驱动的光学材料发现方面。本文呈现了一个折射率和介电常数的光学性质数据库,其中包含了 11054 种独特化学物质的总共 49076 个折射率和 60884 个介电常数数据记录。该数据库是使用最先进的自然语言处理软件 ChemDataExtractor,通过 388461 篇科学论文的语料库自动生成的。该数据库提供了过去 30 年科学文献中线性光学性质信息的代表性概述。这些数据的公开可用性将能够快速搜索某些材料的光学性质。该数据库的大规模将加速对光学材料及其性质的设计和预测的基于数据的研究。据我们所知,这是第一个从大量科学论文中自动生成的光学性质数据库。我们提供了一个网络界面来帮助使用我们的数据库。