College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, China.
Intelligence and Information Engineering College, Tangshan University, Tangshan, China.
Curr Top Med Chem. 2023;23(8):618-626. doi: 10.2174/1568026623666230117121531.
The small sample problem widely exists in the fields of the chemical industry, chemistry, biology, medicine, and food industry. It has been a problem in process modeling and system optimization. The aim of this study is to focus on the problems of small sample size in modeling, the process parameters in the ultrasonic extraction of botanical medicinal materials can be obtained by optimizing the extraction rate model. However, difficulty in data acquisition results in problem of small sample size in modeling, which eventually reduces the accuracy of modeling prediction.
A virtual sample generation method based on full factorial design (FFD) is proposed to solve the problem ofa small sample size. The experiments are first conducted according to the Box- Behnken Design (BBD) to obtain small-size samples, and the response surface function is established accordingly. Then, virtual sample inputs are obtained by the FFD, and the corresponding virtual sample outputs are calculated by the response surface function. Furthermore, a screening method of virtual samples is proposed based on an extreme learning machine (ELM). The connection weights of ELM are used for further optimization and screening of the generated virtual samples.
The results show that virtual sample data can effectively expand the sample size. The precision of the model trained on semi-synthetic samples (small-size experimental simples and virtual samples) is higher than the model trained merely on small-size experimental samples.
The virtual sample generation and screening methods proposed in this paper can effectively solve the modeling problem of small samples. The reliable process parameters can be obtained by optimizing the model trained by the semi-synthetic samples.
小样本问题广泛存在于化学工业、化学、生物学、医学和食品工业等领域,这一直是过程建模和系统优化中的一个问题。本研究旨在解决建模中样本量小的问题,通过优化提取率模型,可以获得植物药材超声提取的工艺参数。然而,数据采集的困难导致建模样本量小,最终降低了建模预测的准确性。
提出了一种基于完全析因设计(FFD)的虚拟样本生成方法来解决小样本量的问题。首先根据 Box-Behnken 设计(BBD)进行实验,获得小样本,并相应地建立响应面函数。然后,通过 FFD 获得虚拟样本输入,并通过响应面函数计算相应的虚拟样本输出。此外,还提出了一种基于极限学习机(ELM)的虚拟样本筛选方法。利用 ELM 的连接权重对生成的虚拟样本进行进一步的优化和筛选。
结果表明,虚拟样本数据可以有效地扩展样本量。在半合成样本(小样本实验样本和虚拟样本)上训练的模型的精度高于仅在小样本实验样本上训练的模型。
本文提出的虚拟样本生成和筛选方法可以有效地解决小样本建模问题。通过优化半合成样本训练的模型,可以获得可靠的工艺参数。