Suppr超能文献

利用偏最小二乘回归中的梯度提升机进行特征选择优化,通过近红外光谱法预测多国产玉米颗粒的水分和蛋白质。

Optimizing feature selection with gradient boosting machines in PLS regression for predicting moisture and protein in multi-country corn kernels via NIR spectroscopy.

机构信息

Department of Agricultural and Biological Engineering, University of Illinois at Urbana- Champaign, Urbana, IL, 61801, USA.

Department of Agronomy and Horticulture, University of Nebraska - Lincoln, Lincoln, NE, USA.

出版信息

Food Chem. 2024 Oct 30;456:140062. doi: 10.1016/j.foodchem.2024.140062. Epub 2024 Jun 10.

Abstract

Differences in moisture and protein content impact both nutritional value and processing efficiency of corn kernels. Near-infrared (NIR) spectroscopy can be used to estimate kernel composition, but models trained on a few environments may underestimate error rates and bias. We assembled corn samples from diverse international environments and used NIR with chemometrics and partial least squares regression (PLSR) to determine moisture and protein. The potential of five feature selection methods to improve prediction accuracy was assessed by extracting sensitive wavelengths. Gradient boosting machines (GBMs), particularly CatBoost and LightGBM, were found to effectively select crucial wavelengths for moisture (1409, 1900, 1908, 1932, 1953, 2174 nm) and protein (887, 1212, 1705, 1891, 2097, 2456 nm). SHAP plots highlighted significant wavelength contributions to model prediction. These results illustrate GBMs' effectiveness in feature engineering for agricultural and food sector applications, including developing multi-country global calibration models for moisture and protein in corn kernels.

摘要

水分和蛋白质含量的差异会影响玉米粒的营养价值和加工效率。近红外(NIR)光谱可用于估算玉米籽粒的成分,但在少数环境下训练的模型可能会低估误差率和偏差。我们收集了来自不同国际环境的玉米样本,并使用 NIR 结合化学计量学和偏最小二乘回归(PLSR)来确定水分和蛋白质含量。通过提取敏感波长,评估了五种特征选择方法提高预测精度的潜力。梯度提升机(GBM),特别是 CatBoost 和 LightGBM,被发现可以有效地选择水分(1409、1900、1908、1932、1953、2174nm)和蛋白质(887、1212、1705、1891、2097、2456nm)的关键波长。SHAP 图突出了对模型预测有重要贡献的波长。这些结果表明 GBM 在农业和食品领域应用中的特征工程方面的有效性,包括开发用于玉米籽粒水分和蛋白质的多国全球校准模型。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验