Suppr超能文献

机器学习驱动识别影响陆地棉纤维产量和品质性状的关键环境因素

Machine Learning-Driven Identification of Key Environmental Factors Influencing Fiber Yield and Quality Traits in Upland Cotton.

作者信息

Souaibou Mohamadou, Yan Haoliang, Dai Panhong, Pan Jingtao, Li Yang, Shi Yuzhen, Gong Wankui, Shang Haihong, Gong Juwu, Yuan Youlu

机构信息

State Key Laboratory of Cotton Bio-Breeding and Integrated Utilization, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China.

Zhengzhou Research Base, State Key Laboratory of Cotton Bio-Breeding and Integrated Utilization, School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, China.

出版信息

Plants (Basel). 2025 Jul 4;14(13):2053. doi: 10.3390/plants14132053.

Abstract

Understanding the influence of environmental factors on cotton performance is crucial for enhancing yield and fiber quality in the context of climate change. This study investigates genotype-by-environment (G×E) interactions in cotton, using data from 250 recombinant inbred lines (CCRI70 RILs) cultivated across 14 diverse environments in China's major cotton cultivation areas. Our findings reveal that environmental effects predominantly influenced yield-related traits (boll weight, lint percentage, and the seed index), contributing to 34.7% to 55.7% of their variance. In contrast fiber quality traits showed lower environmental sensitivity (12.3-27.0%), with notable phenotypic plasticity observed in the boll weight, lint percentage, and fiber micronaire. Employing six machine learning models, Random Forest demonstrated superior predictive ability (R = 0.40-0.72; predictive Pearson correlation = 0.63-0.86). Through SHAP-based interpretation and sliding-window regression, we identified key environmental drivers primarily active during mid-to-late growth stages. This approach effectively reduced the number of influential input variables to just 0.1-2.4% of the original dataset, spanning 2-9 critical time windows per trait. Incorporating these identified drivers significantly improved cross-environment predictions, enhancing Random Forest accuracy by 0.02-0.15. These results underscore the strong potential of machine learning to uncover critical temporal environmental factors underlying G×E interactions and to substantially improve predictive modeling in cotton breeding programs, ultimately contributing to more resilient and productive cotton cultivation.

摘要

了解环境因素对棉花性能的影响对于在气候变化背景下提高产量和纤维质量至关重要。本研究利用在中国主要棉花种植区14个不同环境中种植的250个重组自交系(CCRI70 RILs)的数据,调查了棉花中的基因型与环境互作(G×E)。我们的研究结果表明,环境效应主要影响与产量相关的性状(铃重、衣分和籽指),其变异的34.7%至55.7%归因于此。相比之下,纤维品质性状对环境的敏感性较低(12.3 - 27.0%),在铃重、衣分和纤维马克隆值中观察到显著的表型可塑性。采用六种机器学习模型,随机森林表现出卓越的预测能力(R = 0.40 - 0.72;预测皮尔逊相关系数 = 0.63 - 0.86)。通过基于SHAP的解释和滑动窗口回归,我们确定了主要在生长中后期起作用的关键环境驱动因素。这种方法有效地将有影响的输入变量数量减少到原始数据集的仅0.1 - 2.4%,每个性状跨越2 - 9个关键时间窗口。纳入这些已确定的驱动因素显著改善了跨环境预测,将随机森林的准确率提高了0.02 - 0.15。这些结果强调了机器学习在揭示G×E互作背后关键的时间环境因素以及大幅改进棉花育种计划中的预测建模方面的强大潜力,最终有助于实现更具适应性和高产的棉花种植。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d464/12252131/073f769b7c48/plants-14-02053-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验