Cheng Kaiqi, Dong Ruonan, Pan Fei, Su Wen, Xi Lingjie, Zhang Meng, Geng Jingzhang, Gao Ruichang, Jin Wengang, Abd El-Aty A M
Qinba State Key Laboratory of Biological Resources and Ecological Environment, QinLing-Bashan Moun-tains Bioresources Comprehensive Development 2011 C. I. C, Shaanxi Province Key Laboratory of Bio-Resources, College of Bioscience and Bioengineering Shaanxi University of Technology, Hanzhong, China.
Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing, China.
Front Nutr. 2025 May 20;12:1598875. doi: 10.3389/fnut.2025.1598875. eCollection 2025.
Pigmented rice is fascinated by consumers for its abundant phytochemicals and unique aroma.
In this study, GC-MS-based metabolomics of Yangxian colored rice varieties were performed to characterize their volatile metabolites through multivariate statistics and machine learning algorithms.
Results showed that a total of 357 volatile metabolites were detected and segmented into 9 groups, including 96 organooxygen compounds (26.89%), 52 carboxylic acids and derivatives (14.57%), 42 fatty acyls (11.76%), 16 benzene and substituted derivatives (4.48%), and 11 hydroxy acids and derivatives (3.08%). Multivariate statistics screened 127 differentially abundant metabolites via PLS-DA. Principal component analysis revealed that the percentages of PC1 and PC2 were 52.48% and 27.09%, respectively. Based on differential metabolites with great multicollinearity above 0.8 and the chi-square test (20% feature numbers), only 7 metabolites were found to represent the overall metabolites among the several colored rice varieties. Four machine learning models were further used for the classification of various colored rice varieties, and random forest model was the optimum for predicting classification, with an accuracy of 0.97. Moreover, Shapley additive explanations analysis revealed that the 7 metabolites can be used as potential markers for representing the metabolomic profiles.
These results implied that GC-MS-based metabolomics combined with random forest might be effective for extracting key features among different pigmented rice varieties.
有色稻米因其丰富的植物化学物质和独特的香气而受到消费者的青睐。
在本研究中,对洋县有色水稻品种进行了基于气相色谱-质谱联用的代谢组学分析,以通过多变量统计和机器学习算法表征其挥发性代谢产物。
结果表明,共检测到357种挥发性代谢产物,并分为9组,包括96种有机氧化合物(26.89%)、52种羧酸及其衍生物(14.57%)、42种脂肪酰基(11.76%)、16种苯及其取代衍生物(4.48%)和11种羟基酸及其衍生物(3.08%)。多变量统计通过偏最小二乘判别分析筛选出127种差异丰富的代谢产物。主成分分析显示,PC1和PC2的百分比分别为52.48%和27.09%。基于多重共线性大于0.8的差异代谢产物和卡方检验(20%的特征数量),在几个有色水稻品种中仅发现7种代谢产物可代表总体代谢产物。进一步使用四种机器学习模型对各种有色水稻品种进行分类,随机森林模型是预测分类的最佳模型,准确率为0.97。此外,夏普利值分析表明,这7种代谢产物可作为代表代谢组学特征的潜在标志物。
这些结果表明,基于气相色谱-质谱联用的代谢组学结合随机森林可能有效地提取不同有色水稻品种之间的关键特征。