Xu Xinze, Cheng Zhang, Liu Wenbo, Lin Chunhua, Xia Weibo, Shyu Hsiang-Yang, Miao Weiguo, Yuan Heyang
School of Tropical Agriculture and Forestry/Key Laboratory of Green Prevention and Control of Tropical Plant Diseases and Pests, Ministry of Education, Hainan University, Haikou 570228, China.
Department of Civil & Environmental Engineering, Temple University, 1947N. 12th Street, Philadelphia, PA 19122, United States.
Comput Struct Biotechnol J. 2025 May 27;27:2403-2411. doi: 10.1016/j.csbj.2025.05.044. eCollection 2025.
Early and accurate identification of phytopathogenic conidia, which cause substantial agricultural and economic losses, is critical for preventing disease and prescribing management. However, traditional methods based on morphology and molecular biology are time-consuming, labor-intensive, and ineffective at the species level. Here, we aimed to develop a new classification approach through Raman spectroscopy and data-driven modeling. Among the seven selected fungal species, three characteristic Raman wavenumbers at 1003-1005 cm, 1153-1157 cm, and 1515-1522 cm shared a consistent pattern across species and were attributed to carotenoids. Principal component analysis (PCA) results of their spectra showed substantial overlap that is insufficient for clustering. Consequently, three data-driven models - support vector machines (SVMs), decision trees (DTs), and eXtreme Gradient Boosting Forest (XGBoost) - were trained with three categories of features (number of peaks, maximum peak, and curve roughness) identified within eight characteristic wavenumber ranges. The optimal SVM, DT, and XGBoost determined by hyperparameter tuning achieved prediction precision of 0.88, 0.88, and 0.96, respectively. PCA-XGBoost trained by feeding principal components of PCA to XGBoost achieved prediction precision of 0.94, suggesting that features extracted from the raw datasets outperformed those extracted with PCA in terms of data-driven classification. In summary, a high classification precision has been achieved for conidial Raman spectra through XGBoost based on raw spectral feature extraction. This study lays the ground for the precise classification of phytopathogenic conidia by data-driven classification of Raman spectra thereby proving a great potential for preventing and controlling of plant fungal disease.
早期准确识别导致重大农业和经济损失的植物病原分生孢子,对于预防疾病和制定管理措施至关重要。然而,基于形态学和分子生物学的传统方法耗时、费力,且在物种水平上效果不佳。在此,我们旨在通过拉曼光谱和数据驱动建模开发一种新的分类方法。在七种选定的真菌物种中,1003 - 1005厘米、1153 - 1157厘米和1515 - 1522厘米处的三个特征拉曼波数在各物种间具有一致模式,且归因于类胡萝卜素。它们光谱的主成分分析(PCA)结果显示出大量重叠,不足以进行聚类。因此,使用在八个特征波数范围内识别出的三类特征(峰数、最大峰和曲线粗糙度)对三种数据驱动模型——支持向量机(SVM)、决策树(DT)和极端梯度提升森林(XGBoost)进行了训练。通过超参数调整确定的最优SVM、DT和XGBoost的预测精度分别为0.88、0.88和0.96。通过将PCA的主成分输入XGBoost进行训练的PCA - XGBoost的预测精度为0.94,这表明在数据驱动分类方面,从原始数据集中提取的特征优于用PCA提取的特征。总之,基于原始光谱特征提取的XGBoost对分生孢子拉曼光谱实现了高分类精度。本研究为通过拉曼光谱的数据驱动分类对植物病原分生孢子进行精确分类奠定了基础,从而证明了在预防和控制植物真菌病害方面具有巨大潜力。