Byadi Said, Hashim P K, Sidorov Pavel
Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21, Nishi 10, Kita-ku, Sapporo, Hokkaido, 001-0021, Japan.
Research Institute for Electronic Science, Hokkaido University, Kita 20, Nishi 10, Kita-ku, Sapporo, Hokkaido, 001-0020, Japan.
J Cheminform. 2025 Apr 1;17(1):42. doi: 10.1186/s13321-025-00993-7.
In this manuscript we present the strategy for modeling photoswitch properties (maximum absorption wavelength and thermal half-life of photoisomers) of visible-light azo-photoswitches using structural data. We compile a comprehensive data set from literature sources and perform a rigorous benchmark to select the best feature type and modeling approach. The fragment counts have demonstrated the best performance in the benchmark for both properties. We validate the models in cross-validation and on an external set. The predictions of absorption wavelengths for this set are highly accurate; on the other hand, the model for thermal half-life is less reliable, likely due to the modest size of the data set related to half-life of photoisomers, although consensus modeling approach allows to improve the predictivity. We also provide an interpretation of the modeling results using ColorAtom approach and the insights into the chemical space covered by the data set.Scientific contribution The paper provides a machine learning approach based only on structural features to predict two important photoswitch properties. Unlike previous studies, we do not use any quantum chemical features which accelerates the modeling procedure, while the accuracy of models remains high. Moreover, the fragment counts offer unique approach to model interpretation that is useful for rational design of photoswitches with desired properties.
在本手稿中,我们展示了利用结构数据对可见光偶氮光开关的光开关特性(最大吸收波长和光异构体的热半衰期)进行建模的策略。我们从文献来源汇编了一个综合数据集,并进行了严格的基准测试,以选择最佳的特征类型和建模方法。片段计数在这两个特性的基准测试中表现出了最佳性能。我们在交叉验证和外部数据集上对模型进行了验证。该数据集的吸收波长预测非常准确;另一方面,热半衰期模型不太可靠,这可能是由于与光异构体半衰期相关的数据集规模适中,不过共识建模方法有助于提高预测能力。我们还使用ColorAtom方法对建模结果进行了解释,并深入了解了数据集所涵盖的化学空间。科学贡献本文提供了一种仅基于结构特征的机器学习方法来预测两个重要的光开关特性。与以往的研究不同,我们不使用任何量子化学特征,这加快了建模过程,同时模型的准确性仍然很高。此外,片段计数为模型解释提供了独特的方法,这对于合理设计具有所需特性的光开关很有用。