Institute of Science and Technology, Niigata University, 8050 2-no-cho, Ikarashi, Nishi-ku, Niigata 950-2181, Japan.
ImVisionLabs Inc., 7-3-1, Hongo, Bunkyo-ku, Tokyo 113-8485, Japan.
Spectrochim Acta A Mol Biomol Spectrosc. 2024 Dec 5;322:124785. doi: 10.1016/j.saa.2024.124785. Epub 2024 Jul 4.
Measuring the chemical composition in soybeans is time-consuming and laborious, and even simple near-infrared sensors generally require the creation of calibration curves before application. In this study, a new screening method for soybeans without calibration curves was investigated by combining the excitation emission matrix (EEM) and dimensionality reduction analysis. The EEMs of 34 soybean samples were measured, and representative chemical contents including crude protein, crude oil and isoflavone contents were measured by chemical analysis. Two methods of dimensionality reduction: principal component analysis (PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE) were applied on the EEM data to obtain two-dimensional plots, which were divided into two regions with large or small amount of each chemical components. To classify the large or small levels of each of the chemical composition, machine learning classification models were constructed on the two-dimensional plots after dimensionality reduction. As a result, the classification accuracy was higher in t-SNE than in the combinations of PC1 and PC2 from PCA. Furthermore, in t-SNE, the classification accuracy reached over 90% for all the chemical components. From these results, t-SNE dimensionality reduction on the soybean EEM has the potential for easy and accurate screening of soybeans especially based on isoflavone contents.
测定大豆中的化学成分既耗时又费力,即使是简单的近红外传感器,通常也需要在应用前创建校准曲线。本研究通过将激发发射矩阵(EEM)和降维分析相结合,探讨了一种无需校准曲线的大豆新筛选方法。对 34 个大豆样品的 EEM 进行了测量,并通过化学分析测量了包括粗蛋白、粗油和异黄酮含量在内的代表性化学含量。应用主成分分析(PCA)和 t 分布随机邻嵌入(t-SNE)两种降维方法对 EEM 数据进行处理,得到二维图谱,将其分为两种化学物质含量高或低的区域。为了对每种化学成分的高低水平进行分类,在降维后的二维图谱上构建了机器学习分类模型。结果表明,在 t-SNE 中,所有化学成分的分类准确率均高于 PCA 中 PC1 和 PC2 的组合。此外,在 t-SNE 中,所有化学成分的分类准确率均超过 90%。由此可见,基于大豆 EEM 的 t-SNE 降维方法有望实现大豆的简便、准确筛选,特别是基于异黄酮含量。