Li Yizhang, Liu Lingyu, Wang Zhongmin, Chang Tianying, Li Ke, Xu Wenqing, Wu Yong, Yang Hua, Jiang Daoli
Institute of Automation, Qilu University of Technology (Shandong Academy of Sciences), Key Laboratory of UWB & THz of Shandong Academy of Sciences, Jinan, China.
Shandong Fupai Ejiao, Co., Ltd., Jinan, China.
Front Nutr. 2022 Jul 14;9:925717. doi: 10.3389/fnut.2022.925717. eCollection 2022.
It is a necessity to determine significant food or traditional Chinese medicine (TCM) with low cost, which is more likely to achieve high accurate identification by THz-TDS. In this study, feedforward neural networks based on terahertz spectra are employed to predict the animal origin of gelatins, whose adaption to the mission is examined by parallel models built by random sample partition and initialization. It is found that the generalization performance of feedforward ANNs in original data is not satisfactory although prediction on trained samples can be accurate. A multivariate scattering correction is conducted to enhance prediction accuracy, and 20 additional models verify the effectiveness of such dispose. A special partition of total dataset is conducted based on statistics of parallel models, whose influence on ANN performance is investigated with another 20 models. The performance of the models is unsatisfactory because of notable differences in training and test sets according to principal component analysis. By comparing the distribution of the first two principal components before and after multivariate scattering correction, we found that the reciprocal of the minimum number of line segments required for error-free classification in 2-D feature space can be viewed as an index to describe linear separability of data. The rise of proposed linear separability would have a lower requirement for harsh parameter tuning of ANN models and tolerate random initialization. The difference in principal components of samples between a training set and a data set determines whether partition is acceptable or whether a model would have generality. A rapid way to estimate the performance of an ANN before sufficient tuning on a classification mission is to compare differences between groups and differences within groups. Given that a representative peak missing curve is discussed in this article, an analysis based on gelatin THz spectra may be helpful for studies on some other feature-less species.
有必要确定低成本的重要食品或中药,通过太赫兹时域光谱(THz-TDS)更有可能实现高精度识别。在本研究中,基于太赫兹光谱的前馈神经网络被用于预测明胶的动物来源,通过随机样本划分和初始化构建的并行模型来检验其对该任务的适应性。研究发现,尽管对训练样本的预测可以准确,但前馈人工神经网络在原始数据中的泛化性能并不令人满意。进行了多元散射校正以提高预测精度,另外20个模型验证了这种处理的有效性。基于并行模型的统计对总数据集进行了特殊划分,用另外20个模型研究了其对人工神经网络性能的影响。根据主成分分析,由于训练集和测试集存在显著差异,模型的性能并不理想。通过比较多元散射校正前后前两个主成分的分布,我们发现二维特征空间中无错误分类所需的最小线段数的倒数可以被视为描述数据线性可分性的一个指标。所提出的线性可分性的提高对人工神经网络模型的苛刻参数调整要求较低,并且能够容忍随机初始化。训练集和数据集之间样本主成分的差异决定了划分是否可接受或模型是否具有通用性。在对分类任务进行充分调整之前,估计人工神经网络性能的一种快速方法是比较组间差异和组内差异。鉴于本文讨论了一条具有代表性的峰缺失曲线,基于明胶太赫兹光谱的分析可能有助于对其他一些无特征物种的研究。