Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, North Carolina 27606, United States.
Computer Science, NC State University, Raleigh, North Carolina 27606, United States.
Anal Chem. 2021 Dec 7;93(48):16076-16085. doi: 10.1021/acs.analchem.1c03741. Epub 2021 Nov 23.
Ultraviolet-visible (UV-Vis) absorption spectra are routinely collected as part of high-performance liquid chromatography (HPLC) analysis systems and can be used to identify chemical reaction products by comparison to the reference spectra. Here, we present UV-adVISor as a new computational tool for predicting the UV-Vis spectra from a molecule's structure alone. UV-Vis prediction was approached as a sequence-to-sequence problem. We utilized Long-Short Term Memory and attention-based neural networks with Extended Connectivity Fingerprint Diameter 6 or molecule SMILES to generate predictive models for the UV spectra. We have produced two spectrum datasets (dataset I, = 949, and dataset II, = 2222) using different compound collections and spectrum acquisition methods to train, validate, and test our models. We evaluated the prediction accuracy of the complete spectra by the correspondence of wavelengths of absorbance maxima and with a series of statistical measures (the best test set median model parameters are in parentheses for model II), including RMSE (0.064), (0.71), and dynamic time warping (DTW, 0.194) of the entire spectrum curve. Scrambling molecule structures with the experimental spectra during training resulted in a degraded , confirming the utility of the approaches for prediction. UV-adVISor is able to provide fast and accurate predictions for libraries of compounds.
紫外-可见(UV-Vis)吸收光谱通常作为高效液相色谱(HPLC)分析系统的一部分进行收集,并且可以通过与参考光谱进行比较来识别化学反应产物。在这里,我们提出了 UV-adVISor 作为一种新的计算工具,用于仅根据分子结构预测 UV-Vis 光谱。UV-Vis 预测被视为序列到序列问题。我们利用长短时记忆和基于注意力的神经网络,使用扩展连接指纹直径 6 或分子 SMILES,为 UV 光谱生成预测模型。我们使用不同的化合物集合和光谱采集方法生成了两个光谱数据集(数据集 I, = 949,数据集 II, = 2222),用于训练、验证和测试我们的模型。我们通过吸光度最大值的波长对应和一系列统计措施(对于模型 II,最佳测试集中位数模型参数在括号中)评估了完整光谱的预测准确性,包括整个光谱曲线的均方根误差(RMSE,0.064)、(0.71)和动态时间规整(DTW,0.194)。在训练过程中用实验光谱对分子结构进行混淆会导致 降低,这证实了这些方法在预测中的实用性。UV-adVISor 能够为化合物库提供快速准确的预测。