Shen Huanchao, Geng Yingrui, Ni Hongfei, Wang Hui, Wu Jizhong, Hao Xianwei, Tie Jinxin, Luo Yingjie, Xu Tengfei, Chen Yong, Liu Xuesong
College of Pharmaceutical Sciences, Zhejiang University Hangzhou 310058 China
Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University Hangzhou 310018 China.
RSC Adv. 2022 Nov 14;12(50):32641-32651. doi: 10.1039/d2ra05563e. eCollection 2022 Nov 9.
With the development of near-infrared (NIR) spectroscopy, various calibration transfer algorithms have been proposed, but such algorithms are often based on the same distribution of samples. In machine learning, calibration transfer between types of samples can be achieved using transfer learning and does not need many samples. This paper proposed an instance transfer learning algorithm based on boosted weighted extreme learning machine (weighted ELM) to construct NIR quantitative analysis models based on different instruments for tobacco in practical production. The support vector machine (SVM), weighted ELM, and weighted ELM-AdaBoost models were compared after the spectral data were preprocessed by standard normal variate (SNV) and principal component analysis (PCA), and then the weighted ELM-TrAdaBoost model was built using data from the other domain to realize the transfer from different source domains to the target domain. The coefficient of determination of prediction ( ) of the weighted ELM-TrAdaBoost model of four target components (nicotine, Cl, K, and total nitrogen) reached 0.9426, 0.8147, 0.7548, and 0.6980. The results demonstrated the superiority of ensemble learning and the source domain samples for model construction, improving the models' generalization ability and prediction performance. This is not a bad approach when modeling with small sample sizes and has the advantage of fast learning.
随着近红外(NIR)光谱技术的发展,人们提出了各种校准转移算法,但这些算法通常基于样本的相同分布。在机器学习中,可以使用迁移学习实现不同类型样本之间的校准转移,并且不需要大量样本。本文提出了一种基于增强加权极限学习机(加权ELM)的实例迁移学习算法,以在实际生产中基于不同仪器构建烟草的近红外定量分析模型。在通过标准正态变量(SNV)和主成分分析(PCA)对光谱数据进行预处理后,比较了支持向量机(SVM)、加权ELM和加权ELM-AdaBoost模型,然后使用来自其他域的数据构建加权ELM-TrAdaBoost模型,以实现从不同源域到目标域的转移。四种目标成分(尼古丁、氯、钾和总氮)的加权ELM-TrAdaBoost模型的预测决定系数( )分别达到0.9426、0.8147、0.7548和0.6980。结果证明了集成学习和源域样本在模型构建方面的优越性,提高了模型的泛化能力和预测性能。在小样本量建模时,这是一种不错的方法,并且具有学习速度快的优点。