Pradhan Upendra Kumar, Behera Prasanjit, Das Ritwika, Naha Sanchita, Gupta Ajit, Parsad Rajender, Pradhan Sukanta Kumar, Meher Prabina Kumar
Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
Department of Bioinformatics, Odisha University of Agriculture & Technology, Bhubaneswar, Odisha 751003, India.
Comput Biol Chem. 2024 Dec;113:108205. doi: 10.1016/j.compbiolchem.2024.108205. Epub 2024 Sep 6.
In the realm of plant biology, understanding the intricate regulatory mechanisms governing stress responses stands as a pivotal pursuit. Circular RNAs (circRNAs), emerging as critical players in gene regulation, have garnered attention in recent days for their potential roles in abiotic stress adaptation. A comprehensive grasp of circRNAs' functions in stress response offers avenues for breeders to manipulating plants to develop abiotic stress resistant crop cultivars to thrive in challenging climates. This study pioneers a machine learning-based model for predicting abiotic stress-responsive circRNAs. The K-tuple nucleotide composition (KNC) and Pseudo KNC (PKNC) features were utilized to numerically represent circRNAs. Three different feature selection strategies were employed to select relevant and non-redundant features. Eight shallow and four deep learning algorithms were evaluated to build the final predictive model. Following five-fold cross-validation process, XGBoost learning algorithm demonstrated superior performance with LightGBM-chosen 260 KNC features (Accuracy: 74.55 %, auROC: 81.23 %, auPRC: 76.52 %) and 160 PKNC features (Accuracy: 74.32 %, auROC: 81.04 %, auPRC: 76.43 %), over other combinations of learning algorithms and feature selection techniques. Further, the robustness of the developed models were evaluated using an independent test dataset, where the overall accuracy, auROC and auPRC were found to be 73.13 %, 72.34 % and 72.68 % for KNC feature set and 73.52 %, 79.53 % and 73.09 % for PKNC feature set, respectively. This computational approach was also integrated into an online prediction tool, AScirRNA (https://iasri-sg.icar.gov.in/ascirna/) for easy prediction by the users. Both the proposed model and the developed tool are poised to augment ongoing efforts in identifying stress-responsive circRNAs in plants.
在植物生物学领域,了解调控应激反应的复杂机制是一项至关重要的研究。环状RNA(circRNAs)作为基因调控中的关键角色,近年来因其在非生物胁迫适应中的潜在作用而备受关注。全面掌握circRNAs在应激反应中的功能,为育种者提供了操纵植物以培育抗非生物胁迫作物品种从而在恶劣气候中茁壮成长的途径。本研究开创了一种基于机器学习的模型来预测非生物胁迫响应性circRNAs。利用K元核苷酸组成(KNC)和伪KNC(PKNC)特征对circRNAs进行数值表征。采用三种不同的特征选择策略来选择相关且非冗余的特征。评估了八种浅层和四种深度学习算法以构建最终的预测模型。经过五折交叉验证过程,XGBoost学习算法在选择260个KNC特征(准确率:74.55%,auROC:81.23%,auPRC:76.52%)和160个PKNC特征(准确率:74.32%,auROC:81.04%,auPRC:76.43%)时表现出优于其他学习算法和特征选择技术组合的性能。此外,使用独立测试数据集评估了所开发模型的稳健性,其中KNC特征集的总体准确率、auROC和auPRC分别为73.13%、72.34%和72.68%,PKNC特征集的分别为73.52%、79.53%和73.09%。这种计算方法还被集成到一个在线预测工具AScirRNA(https://iasri-sg.icar.gov.in/ascirna/)中,方便用户进行预测。所提出的模型和开发的工具都有望加强目前在鉴定植物中应激反应性circRNAs方面的工作。