Research Center for Analytical Sciences, Frontiers Science Center for New Organic Matter, College of Chemistry, Tianjin Key Laboratory of Biosensing and Molecular Recognition, State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin 300071, China.
Haihe Laboratory of Sustainable Chemical Transformations, Tianjin 300192, China.
J Chem Inf Model. 2022 Aug 22;62(16):3695-3703. doi: 10.1021/acs.jcim.2c00786. Epub 2022 Aug 2.
An autoencoder architecture was adopted for near-infrared (NIR) spectral analysis by extracting the common features in the spectra. Three autoencoder-based networks with different purposes were constructed. First, a spectral encoder was established by training the network with a set of spectra as the input. The features of the spectra can be encoded by the nodes in the bottleneck layer, which in turn can be used to build a sparse and robust model. Second, taking the spectra of one instrument as the input and that of another instrument as the reference output, the common features in both spectra can be obtained in the bottleneck layer. Therefore, in the prediction step, the spectral features of the second can be predicted by taking the reverse of the decoder as the encoder. Furthermore, transfer learning was used to build the model for the spectra of more instruments by fine-tuning the trained network. NIR datasets of plant, wheat, and pharmaceutical tablets measured on multiple instruments were used to test the method. The multi-linear regression (MLR) model with the encoded features was found to have a similar or slightly better performance in prediction compared with the partial least-squares (PLS) model.
采用自动编码器架构通过提取光谱中的共同特征来进行近红外(NIR)光谱分析。构建了三个具有不同目的的基于自动编码器的网络。首先,通过用一组光谱作为输入来训练网络,建立光谱编码器。光谱的特征可以由瓶颈层中的节点进行编码,进而可以用来构建稀疏且稳健的模型。其次,以一台仪器的光谱作为输入,以另一台仪器的光谱作为参考输出,在瓶颈层中可以获得两个光谱的共同特征。因此,在预测步骤中,可以通过将解码器的逆作为编码器来预测第二个光谱的光谱特征。此外,通过微调训练好的网络,使用迁移学习为更多仪器的光谱构建模型。使用来自多个仪器测量的植物、小麦和药片的 NIR 数据集来测试该方法。发现与偏最小二乘(PLS)模型相比,使用编码特征的多元线性回归(MLR)模型在预测中具有相似或略好的性能。