Noreen Sadaf, Aljaafreh Mamduh J
Department of Chemistry, University of Gujrat, Gujrat, 50700, Punjab, Pakistan.
Physics Department, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, 11623, Saudi Arabia.
J Fluoresc. 2025 Sep 18. doi: 10.1007/s10895-025-04540-3.
The development of organic sensitizers with high molar extinction (ε) coefficients is important for various light absorption applications. To accelerate the discovery of such compounds, a machine learning (ML) analysis has been applied to explore their vast chemical space. A dataset of 676 organic chromophores is analyzed by designing their electronic, topological, and molecular descriptors to predict their ε. Among the 10 tested ML models, Gradient Boosting, Random Forest, Extra Trees, and Historical Gradient Boosting regressors show good correlation with their experimental and predicted values (R ≈ 0.70). Their Shapley Feature importance reveals that Subgraph Density of Secondary Carbon-Hydrogen (SdsCH) and logarithm of the partition coefficient- an Der Waals Surface Area Descriptor 8 (SlogP_VSA8) Descriptors have a significant impact on model performance. Additionally, by leveraging breaking retrosynthetic analysis, 3288 novel structures with potential high ε have been synthesized to validate their feasibility through dimensionality reduction analysis. Their synthetic accessibility (SA) calculations identify the top structures for their experimental synthesis in the future. Interestingly, the findings indicate that new structures with SMILES lengths of 35-80 units can exhibit the highest SA.
开发具有高摩尔消光(ε)系数的有机敏化剂对于各种光吸收应用至关重要。为了加速此类化合物的发现,已应用机器学习(ML)分析来探索其广阔的化学空间。通过设计676种有机发色团的电子、拓扑和分子描述符来预测它们的ε,对该数据集进行了分析。在10个测试的ML模型中,梯度提升、随机森林、极端随机树和历史梯度提升回归器与其实验值和预测值显示出良好的相关性(R ≈ 0.70)。它们的Shapley特征重要性表明,仲碳-氢的子图密度(SdsCH)和分配系数-范德华表面积描述符8(SlogP_VSA8)描述符的对数对模型性能有重大影响。此外,通过利用逆向合成分析,合成了3288种具有潜在高ε的新结构,以通过降维分析验证其可行性。它们的合成可及性(SA)计算确定了未来实验合成的顶级结构。有趣的是,研究结果表明,SMILES长度为35-80个单位的新结构可以表现出最高的SA。