ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India.
ICAR-National Bureau of Plant Genetic Resources, New Delhi, India.
Plant Genome. 2024 Mar;17(1):e20259. doi: 10.1002/tpg2.20259. Epub 2022 Sep 13.
One of the thrust areas of research in plant breeding is to develop crop cultivars with enhanced tolerance to abiotic stresses. Thus, identifying abiotic stress-responsive genes (SRGs) and proteins is important for plant breeding research. However, identifying such genes via established genetic approaches is laborious and resource intensive. Although transcriptome profiling has remained a reliable method of SRG identification, it is species specific. Additionally, identifying multistress responsive genes using gene expression studies is cumbersome. Thus, endorsing the need to develop a computational method for identifying the genes associated with different abiotic stresses. In this work, we aimed to develop a computational model for identifying genes responsive to six abiotic stresses: cold, drought, heat, light, oxidative, and salt. The predictions were performed using support vector machine (SVM), random forest, adaptive boosting (ADB), and extreme gradient boosting (XGB), where the autocross covariance (ACC) and K-mer compositional features were used as input. With ACC, K-mer, and ACC + K-mer compositional features, the overall accuracy of ∼60-77, ∼75-86, and ∼61-78% were respectively obtained using the SVM algorithm with fivefold cross-validation. The SVM also achieved higher accuracy than the other three algorithms. The proposed model was also assessed with an independent dataset and obtained an accuracy consistent with cross-validation. The proposed model is the first of its kind and is expected to serve the requirement of experimental biologists; however, the prediction accuracy was modest. Given its importance for the research community, the online prediction application, ASRpro, is made freely available (https://iasri-sg.icar.gov.in/asrpro/) for predicting abiotic SRGs and proteins.
植物育种研究的一个重点领域是开发具有增强的非生物胁迫耐受性的作物品种。因此,鉴定非生物胁迫响应基因(SRGs)和蛋白质对于植物育种研究很重要。然而,通过已建立的遗传方法鉴定这些基因既费力又耗费资源。尽管转录组谱分析仍然是鉴定 SRG 的可靠方法,但它是特定于物种的。此外,使用基因表达研究鉴定多胁迫响应基因很麻烦。因此,需要开发一种用于鉴定与不同非生物胁迫相关的基因的计算方法。在这项工作中,我们旨在开发一种用于鉴定对六种非生物胁迫(冷、干旱、热、光、氧化和盐)有反应的基因的计算模型。使用支持向量机(SVM)、随机森林、自适应提升(ADB)和极端梯度提升(XGB)进行预测,其中使用自交叉协方差(ACC)和 K-mer 组成特征作为输入。使用 ACC、K-mer 和 ACC+K-mer 组成特征,SVM 算法在五重交叉验证中分别获得了约 60-77%、75-86%和 61-78%的整体准确性。SVM 算法的准确性也高于其他三种算法。该模型还使用独立数据集进行了评估,并获得了与交叉验证一致的准确性。该模型是首创的,有望满足实验生物学家的需求;然而,预测准确性是中等的。鉴于其对研究界的重要性,在线预测应用程序 ASRpro 免费提供(https://iasri-sg.icar.gov.in/asrpro/),用于预测非生物 SRGs 和蛋白质。