Ye Zong-Rong, Huang I-Shou, Chan Yu-Te, Li Zhong-Ji, Liao Chen-Cheng, Tsai Hao-Rong, Hsieh Meng-Chi, Chang Chun-Chih, Tsai Ming-Kang
Department of Chemistry, National Taiwan Normal University Taipei 11677 Taiwan
Department of Chemistry, The University of Chicago Chicago IL 60637 USA.
RSC Adv. 2020 Jun 23;10(40):23834-23841. doi: 10.1039/d0ra05014h. eCollection 2020 Jun 19.
Organic fluorescent molecules play critical roles in fluorescence inspection, biological probes, and labeling indicators. More than ten thousand organic fluorescent molecules were imported in this study, followed by a machine learning based approach for extracting the intrinsic structural characteristics that were found to correlate with the fluorescence emission. A systematic informatics procedure was introduced, starting from descriptor cleaning, descriptor space reduction, and statistical-meaningful regression to build a broad and valid model for estimating the fluorescence emission wavelength. The least absolute shrinkage and selection operator (Lasso) regression coupling with the random forest model was finally reported as the numerical predictor as well as being fulfilled with the statistical criteria. Such an informatics model appeared to bring comparable predictive ability, being complementary to the conventional time-dependent density functional theory method in emission wavelength prediction, however, with a fractional computational expense.
有机荧光分子在荧光检测、生物探针和标记指示剂中发挥着关键作用。本研究引入了一万多种有机荧光分子,随后采用基于机器学习的方法来提取与荧光发射相关的内在结构特征。引入了一种系统的信息学程序,从描述符清理、描述符空间缩减和具有统计意义的回归开始,以建立一个广泛且有效的模型来估计荧光发射波长。最终报告了最小绝对收缩和选择算子(Lasso)回归与随机森林模型相结合作为数值预测器,并满足统计标准。这种信息学模型似乎具有相当的预测能力,在发射波长预测方面与传统的含时密度泛函理论方法互补,但计算成本仅为其一小部分。