Zhu Qun-Xiong, Zhang Hong-Tao, Tian Ye, Zhang Ning, Xu Yuan, He Yan-Lin
College of Information Science & Technology, Beijing University of Chemical Technology, Beijing, 100029, China; Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China.
College of Information Science & Technology, Beijing University of Chemical Technology, Beijing, 100029, China; Engineering Research Center of Intelligent PSE, Ministry of Education of China, Beijing 100029, China.
ISA Trans. 2023 Mar;134:290-301. doi: 10.1016/j.isatra.2022.08.021. Epub 2022 Aug 26.
With the development of industrialization, the production scale and complexity of process industries are getting larger and larger. But, limited by the small amounts of samples and the uneven sample distribution in the process industry, it is difficult to establish accurate and efficient data-driven soft sensor models to predict some variables. To further develop the application of soft sensor models, generating new virtual samples based on the original sample distribution to extend the sample set is an ideal approach to solve this problem. In this paper, a novel virtual sample generation method based on the co-training of two K-Nearest Neighbor (KNN) models is proposed. First, according to the sparse parameter, sparse regions in each dimension of the feature space are identified. Second, the input features of virtual samples are generated in these sparse regions by performing interpolation operations. Third, the outputs of virtual samples are predicted by double KNN regressors based on co-training. The qualified virtual samples are screened and the model is updated using these virtual samples to improve the prediction accuracy of the double KNN models. To verify the effectiveness and superiority of the proposed virtual sample generation method based on the co-training (CTVSG), case studies are conducted using two standard functions and a Purified Terephthalic Acid (PTA) industrial dataset, where the effectiveness of CTVSG is confirmed.
随着工业化的发展,流程工业的生产规模和过程复杂性越来越大。但是,受流程工业中样本数量少和样本分布不均的限制,难以建立准确、高效的数据驱动软传感器模型来预测某些变量。为了进一步拓展软传感器模型的应用,基于原始样本分布生成新的虚拟样本以扩展样本集是解决该问题的理想方法。本文提出了一种基于两个K近邻(KNN)模型协同训练的新型虚拟样本生成方法。首先,根据稀疏参数识别特征空间各维度中的稀疏区域。其次,通过执行插值操作在这些稀疏区域中生成虚拟样本的输入特征。第三,基于协同训练由双KNN回归器预测虚拟样本的输出。筛选出合格的虚拟样本,并使用这些虚拟样本更新模型,以提高双KNN模型的预测精度。为验证所提出的基于协同训练的虚拟样本生成方法(CTVSG)的有效性和优越性,使用两个标准函数和一个精对苯二甲酸(PTA)工业数据集进行了案例研究,证实了CTVSG的有效性。