Zinati Zahra, Nazari Leyla, Niazi Ali
Department of Agroecology, College of Agriculture and Natural Resources of Darab, Shiraz University, Shiraz, Iran.
Crop and Horticultural Science Research Department, Fars Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education and Extension Organization (AREEO), Shiraz, Iran.
Bot Stud. 2024 Aug 14;65(1):25. doi: 10.1186/s40529-024-00433-z.
As climate change intensifies, the frequency and severity of waterlogging are expected to increase, necessitating a deeper understanding of the cucumber response to this stress. In this study, three public RNA-seq datasets (PRJNA799460, PRJNA844418, and PRJNA678740) comprising 36 samples were analyzed. Various feature selection algorithms including Uncertainty, Relief, SVM (Support Vector Machine), Correlation, and logistic least absolute shrinkage, and selection operator (LASSO) were performed to identify the most significant genes related to the waterlogging stress response. These feature selection techniques, which have different characteristics, were used to reduce the complexity of the data and thereby identify the most significant genes related to the waterlogging stress response. Uncertainty, Relief, SVM, Correlation, and LASSO identified 4, 4, 10, 21, and 13 genes, respectively. Differential gene correlation analysis (DGCA) focusing on the 36 selected genes identified changes in correlation patterns between the selected genes under waterlogged versus control conditions, providing deeper insights into the regulatory networks and interactions among the selected genes. DGCA revealed significant changes in the correlation of 13 genes between control and waterlogging conditions. Finally, we validated 13 genes using the Random Forest (RF) classifier, which achieved 100% accuracy and a 1.0 Area Under the Curve (AUC) score. The SHapley Additive exPlanations (SHAP) values clearly showed the significant impact of LOC101209599, LOC101217277, and LOC101216320 on the model's predictive power. In addition, we employed the Boruta as a wrapper feature selection method to further validate our gene selection strategy. Eight of the 13 genes were common across the four feature weighting algorithms, LASSO, DGCA, and Boruta, underscoring the robustness and reliability of our gene selection strategy. Notably, the genes LOC101209599, LOC101217277, and LOC101216320 were among genes identified by multiple feature selection methods from different categories (filtering, wrapper, and embedded). Pathways associated with these specific genes play a pivotal role in regulating stress tolerance, root development, nutrient absorption, sugar metabolism, gene expression, protein degradation, and calcium signaling. These intricate regulatory mechanisms are crucial for cucumbers to adapt effectively to waterlogging conditions. These findings provide valuable insights for uncovering targets in breeding new cucumber varieties with enhanced stress tolerance.
随着气候变化加剧,预计涝渍的频率和严重程度将会增加,因此有必要更深入地了解黄瓜对这种胁迫的反应。在本研究中,对包含36个样本的三个公共RNA测序数据集(PRJNA799460、PRJNA844418和PRJNA678740)进行了分析。采用了包括不确定性、Relief、支持向量机(SVM)、相关性以及逻辑最小绝对收缩和选择算子(LASSO)在内的各种特征选择算法,以识别与涝渍胁迫反应相关的最重要基因。这些具有不同特征的特征选择技术被用于降低数据的复杂性,从而识别与涝渍胁迫反应相关的最重要基因。不确定性、Relief、SVM、相关性和LASSO分别识别出4个、4个、10个、21个和13个基因。针对这36个选定基因的差异基因相关性分析(DGCA)确定了涝渍条件与对照条件下选定基因之间相关性模式的变化,从而更深入地了解选定基因之间的调控网络和相互作用。DGCA揭示了对照和涝渍条件下13个基因的相关性有显著变化。最后,我们使用随机森林(RF)分类器对13个基因进行了验证,其准确率达到100%,曲线下面积(AUC)得分为1.0。SHapley加法解释(SHAP)值清楚地显示了LOC101209599、LOC101217277和LOC101216320对模型预测能力的显著影响。此外,我们采用Boruta作为一种包装特征选择方法来进一步验证我们的基因选择策略。13个基因中的8个在LASSO、DGCA和Boruta这四种特征加权算法中是共有的,这突出了我们基因选择策略的稳健性和可靠性。值得注意的是,LOC101209599、LOC101217277和LOC101216320这几个基因是通过来自不同类别(过滤、包装和嵌入)的多种特征选择方法识别出来的。与这些特定基因相关的通路在调节胁迫耐受性、根系发育、养分吸收、糖代谢、基因表达、蛋白质降解和钙信号传导中起着关键作用。这些复杂的调控机制对于黄瓜有效适应涝渍条件至关重要。这些发现为揭示培育具有增强胁迫耐受性的新黄瓜品种的目标提供了有价值的见解。