The National Engineering Laboratory of Crop Resistance Breeding, School of Life Sciences, Anhui Agricultural University, Hefei 230036, China.
School of Materials and Chemistry, Anhui Agricultural University, Hefei, Anhui 230036, China.
Bioresour Technol. 2024 Dec;413:131531. doi: 10.1016/j.biortech.2024.131531. Epub 2024 Sep 23.
Cellulose and hemicellulose are key cross-linked carbohydrates affecting bioethanol production in maize stalks. Traditional wet chemical methods for their detection are labor-intensive, highlighting the need for high-throughput techniques. This study used Fourier transform infrared (FTIR) spectroscopy combined with machine learning (ML) algorithms on 200 large-scale maize germplasms to develop robust predictive models for stalk cellulose, hemicellulose and holocellulose content. We identified several peak height features correlated with three contents, used them as input data for model building. Four ML algorithms demonstrated higher predictive accuracy, achieving coefficient of determination (R) ranging from 0.83 to 0.97. Notably, the Categorical Boosting algorithm yielded optimal models with coefficient of determination (R) exceeding 0.91 for the training set and over 0.81 for the test set. The approach combined FTIR spectroscopy with ML algorithms offers a precise and high-throughput tool for predicting stalk cellulose, hemicellulose and holocellulose contents, benefiting maize genetic breeding for bioenergy and biofuels.
纤维素和半纤维素是影响玉米秸秆生物乙醇产量的关键交联碳水化合物。传统的湿化学检测方法需要大量的劳动力,这凸显了高通量技术的必要性。本研究使用傅里叶变换红外(FTIR)光谱结合机器学习(ML)算法,对 200 个大型玉米种质资源进行了分析,为秸秆纤维素、半纤维素和全纤维素含量建立了稳健的预测模型。我们确定了与三种含量相关的几个峰高特征,并将其作为模型构建的输入数据。四种 ML 算法表现出更高的预测准确性,其决定系数(R)范围为 0.83 到 0.97。值得注意的是,分类提升算法为训练集和测试集分别生成了最优模型,其决定系数(R)分别超过 0.91 和 0.81。该方法结合了 FTIR 光谱和 ML 算法,为预测秸秆纤维素、半纤维素和全纤维素含量提供了一种精确、高通量的工具,有利于生物能源和生物燃料的玉米遗传育种。