Department of Biotechnology and Food Technology, Faculty of Science, Doornfontein Campus, University of Johannesburg, P.O Box 17011, Johannesburg, 2028, Gauteng, South Africa.
Food Innovation Research Group, Department of Biotechnology and Food Technology, Faculty of Science, University of Johannesburg, P.O Box 17011, Johannesburg, 2028, Gauteng, South Africa.
Sci Rep. 2023 Jul 20;13(1):11755. doi: 10.1038/s41598-023-38322-3.
Artificial neural networks (ANNs) have in recent times found increasing application in predictive modelling of various food processing operations including fermentation, as they have the ability to learn nonlinear complex relationships in high dimensional datasets, which might otherwise be outside the scope of conventional regression models. Nonetheless, a major limiting factor of ANNs is that they require quite a large amount of training data for better performance. Obtaining such an amount of data from biological processes is usually difficult for many reasons. To resolve this problem, methods are proposed to inflate existing data by artificially synthesizing additional valid data samples. In this paper, we present a generative adversarial network (GAN) able to synthesize an infinite amount of realistic multi-dimensional regression data from limited experimental data (n = 20). Rigorous testing showed that the synthesized data (n = 200) significantly conserved the variances and distribution patterns of the real data. Further, the synthetic data was used to generalize a deep neural network. The model trained on the artificial data showed a lower loss (2.029 ± 0.124) and converged to a solution faster than its counterpart trained on real data (2.1614 ± 0.117).
人工神经网络(ANNs)最近在各种食品加工操作的预测建模中得到了越来越多的应用,包括发酵,因为它们具有在高维数据集中学习非线性复杂关系的能力,而这些关系可能超出了传统回归模型的范围。然而,ANNs 的一个主要限制因素是,它们需要相当多的训练数据才能获得更好的性能。由于多种原因,从生物过程中获得如此大量的数据通常是困难的。为了解决这个问题,提出了一些方法来通过人工合成额外的有效数据样本来扩充现有数据。在本文中,我们提出了一种生成对抗网络(GAN),它能够从有限的实验数据(n=20)中合成无限数量的真实多维回归数据。严格的测试表明,合成数据(n=200)显著地保持了真实数据的方差和分布模式。此外,还使用合成数据对深度神经网络进行了泛化。在人工数据上训练的模型表现出较低的损失(2.029±0.124),并且比在真实数据上训练的模型更快地收敛到解决方案(2.1614±0.117)。