Suppr超能文献

通过机器学习加速高摩尔消光有机敏化剂的化学空间生成

Accelerated Chemical Space Generation of High Molar Extinction Organic Sensitizers via Machine Learning.

作者信息

Noreen Sadaf, Aljaafreh Mamduh J

机构信息

Department of Chemistry, University of Gujrat, Gujrat, 50700, Punjab, Pakistan.

Physics Department, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, 11623, Saudi Arabia.

出版信息

J Fluoresc. 2025 Sep 18. doi: 10.1007/s10895-025-04540-3.

Abstract

The development of organic sensitizers with high molar extinction (ε) coefficients is important for various light absorption applications. To accelerate the discovery of such compounds, a machine learning (ML) analysis has been applied to explore their vast chemical space. A dataset of 676 organic chromophores is analyzed by designing their electronic, topological, and molecular descriptors to predict their ε. Among the 10 tested ML models, Gradient Boosting, Random Forest, Extra Trees, and Historical Gradient Boosting regressors show good correlation with their experimental and predicted values (R ≈ 0.70). Their Shapley Feature importance reveals that Subgraph Density of Secondary Carbon-Hydrogen (SdsCH) and logarithm of the partition coefficient- an Der Waals Surface Area Descriptor 8 (SlogP_VSA8) Descriptors have a significant impact on model performance. Additionally, by leveraging breaking retrosynthetic analysis, 3288 novel structures with potential high ε have been synthesized to validate their feasibility through dimensionality reduction analysis. Their synthetic accessibility (SA) calculations identify the top structures for their experimental synthesis in the future. Interestingly, the findings indicate that new structures with SMILES lengths of 35-80 units can exhibit the highest SA.

摘要

开发具有高摩尔消光(ε)系数的有机敏化剂对于各种光吸收应用至关重要。为了加速此类化合物的发现,已应用机器学习(ML)分析来探索其广阔的化学空间。通过设计676种有机发色团的电子、拓扑和分子描述符来预测它们的ε,对该数据集进行了分析。在10个测试的ML模型中,梯度提升、随机森林、极端随机树和历史梯度提升回归器与其实验值和预测值显示出良好的相关性(R ≈ 0.70)。它们的Shapley特征重要性表明,仲碳-氢的子图密度(SdsCH)和分配系数-范德华表面积描述符8(SlogP_VSA8)描述符的对数对模型性能有重大影响。此外,通过利用逆向合成分析,合成了3288种具有潜在高ε的新结构,以通过降维分析验证其可行性。它们的合成可及性(SA)计算确定了未来实验合成的顶级结构。有趣的是,研究结果表明,SMILES长度为35-80个单位的新结构可以表现出最高的SA。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验