Souza Rubens C, Duarte Julio C, Goldschmidt Ronaldo R, Borges Itamar
Departamento de Engenharia de Defesa, Instituto Militar de Engenharia (IME), Praça Gen. Tibúrcio 80, Rio de Janeiro, Rio de Janeiro 22290 270, Brazil.
Departamento de Engenharia da Computação, Instituto Militar de Engenharia (IME), Praça Gen. Tibúrcio 80, Rio de Janeiro, Rio de Janeiro 22290 270, Brazil.
J Chem Inf Model. 2025 Apr 14;65(7):3270-3281. doi: 10.1021/acs.jcim.4c02403. Epub 2025 Mar 20.
The search for functional fluorescent organic materials can significantly benefit from the rapid and accurate predictions of photophysical properties. However, screening large numbers of potential fluorophore molecules in different solvents faces limitations of quantum mechanical calculations and experimental measurements. In this work, we develop machine learning (ML) algorithms for predicting the fluorescence of a molecule, focusing on two target properties: emission wavelengths (WLs) and quantum yields (QYs). For this purpose, we employ the Deep4Chem database which contains the optical properties of 20,236 combinations of 7,016 chromophores in 365 different solvents. Several chemical descriptors, or features, were selected as inputs for each model, and each molecule was characterized by its SMILES fingerprint. The Shapley additive explanations (SHAP) technique was used to rationalize the results, showing that the most impactful properties are chromophore-related, as expected from chemical intuition. For the best-performing model, the Random Forest, our results for the test set show a root-mean-square error (RMSE) of 28.8 nm (0.15 eV) for WLs and 0.19 for QYs. The developed ML models were used to predict, thus completing, the missing results for the WL and QY target properties in the original Deep4Chem database, resulting in two new databases: one for each property. Testing our ML models for each target property in molecules not included in the original Deep4Chem database gave good results.
对功能性荧光有机材料的探索能够从光物理性质的快速准确预测中显著受益。然而,在不同溶剂中筛选大量潜在荧光团分子面临量子力学计算和实验测量的局限性。在这项工作中,我们开发了用于预测分子荧光的机器学习(ML)算法,重点关注两个目标性质:发射波长(WLs)和量子产率(QYs)。为此,我们使用了Deep4Chem数据库,该数据库包含7016种发色团在365种不同溶剂中的20236种组合的光学性质。选择了几个化学描述符或特征作为每个模型的输入,并且每个分子由其SMILES指纹表征。使用Shapley加法解释(SHAP)技术对结果进行合理化分析,结果表明,正如化学直觉所预期的那样,最具影响力的性质与发色团相关。对于表现最佳的随机森林模型,我们对测试集的结果显示,发射波长的均方根误差(RMSE)为28.8 nm(0.15 eV),量子产率的均方根误差为0.19。所开发的ML模型用于预测,从而补齐了原始Deep4Chem数据库中发射波长和量子产率目标性质的缺失结果,得到了两个新数据库:每个性质一个。在原始Deep4Chem数据库未包含的分子中对我们的ML模型进行每个目标性质的测试,结果良好。