College of Chemistry, Nankai University, Tianjin 300071, China.
Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China.
J Chem Inf Model. 2021 Mar 22;61(3):1053-1065. doi: 10.1021/acs.jcim.0c01203. Epub 2021 Feb 23.
The development of functional organic fluorescent materials calls for fast and accurate predictions of photophysical parameters for processes such as high-throughput virtual screening, while the task is challenged by the limitations of quantum mechanical calculations. We establish a database covering >4300 solvated organic fluorescent dyes with 3000 distinct compounds and develop a new machine learning approach aimed at efficient and accurate predictions of emission wavelength and photoluminescence quantum yield (PLQY). Our feature engineering has given rise to a functionalized structure descriptor (FSD) and a comprehensive general solvent descriptor (CGSD), whereby a highly black-box computational framework is realized with consistently good accuracy across different dye families, ability of describing substitution effects and solvent effects, efficiency for large-scale predictions, and workability with on-the-fly learning. Evaluations with unseen molecules suggest a remarkable mean absolute error of 0.13 for PLQY and 0.080 eV for emission energy, the latter comparable to time-dependent density functional theory (TD-DFT) calculations. An online prediction platform was constructed based on the ensemble model to make predictions in various solvents. Our statistical learning methodology will complement quantum mechanical calculations as an efficient alternative approach for the prediction of these parameters.
功能有机荧光材料的发展需要快速准确地预测光物理参数,例如高通量虚拟筛选等过程,但这一任务受到量子力学计算的局限性的挑战。我们建立了一个包含>4300 种溶剂化有机荧光染料的数据库,其中包含 3000 种不同的化合物,并开发了一种新的机器学习方法,旨在高效准确地预测发射波长和光致发光量子产率 (PLQY)。我们的特征工程提出了功能化结构描述符 (FSD) 和综合通用溶剂描述符 (CGSD),由此实现了一个高度黑盒计算框架,在不同染料家族中具有一致的高精度、描述取代效应和溶剂效应的能力、大规模预测的效率以及即时学习的可行性。对未见分子的评估表明,PLQY 的平均绝对误差为 0.13,发射能的平均绝对误差为 0.080 eV,后者可与含时密度泛函理论 (TD-DFT) 计算相媲美。基于集成模型构建了一个在线预测平台,以在各种溶剂中进行预测。我们的统计学习方法将补充量子力学计算,成为预测这些参数的高效替代方法。