Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China.
Hunan Provincial Engineering and Technology Research Center for Agricultural Big Data Analysis and Decision-Making, Hunan Agricultural University, Changsha, 410128, China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha, 410128, China; Shanxi Province Jincheng City Landscaping Service Center, Shanxi, 048000, China.
Phytochemistry. 2022 Aug;200:113222. doi: 10.1016/j.phytochem.2022.113222. Epub 2022 May 11.
In crops, RNA editing is one of the most important post-transcriptional processes in which specific cytidines (C) in virtually all mitochondrial protein-coding genes are converted to uridines (U). Despite extensive recent research in RNA editing, exploring all of the C-to-U editing events efficiently on the genomic scale remains challengeable. Developing accurate prediction methods for the detection of RNA editing sites would dramatically reduce experimental determination. Therefore, we propose a novel method, iPReditor-CMG (improved predictive RNA editor for crop mitochondrial genomes), to predict crop mitochondrial editing sites using genome sequence and an optimised support vector machine (SVM). We first selected three mitochondrial genomes with known RNA editing sites from Arabidopsis thaliana, Brassica napus and Oryza sativa, released by NCBI, as the training and test sets. The genes and their transcripts from self-sequenced tobacco mitochondrial ATPase were selected as the validation set. The iPReditor-CMG first coded the genome sequences as numerical vectors and then performed an efficient feature selection on the high-dimensional feature space, where the SVM was employed in feature selection and following modelling. The average independent prediction accuracy of intraspecific editing sites across three species was 0.85, and up to 0.91 in A. thaliana, which outperformed the reference models. For the interspecific independent prediction, the prediction accuracy between dicotyledons was 0.78 and the accuracy between dicotyledons and monocotyledons was 0.56, which implies that there might be similarity in the C-to-U editing mechanism in close relatives. Finally, the best model was identified with an independent test accuracy of 0.91 and an AUC of 0.88, which suggested that five unreported feature sequences, i.e. TGACA, ACAAC, GTAGA, CCGTT and TAACA, are closely associated with the editing phenomenon. Multiple tests supported that the iPReditor-CMG could be effectively applied to predict editing sites in crop mitochondria, which may further contribute to understanding the mechanisms of site editing and post-transcriptional events in crop mitochondria.
在作物中,RNA 编辑是最重要的转录后过程之一,其中几乎所有线粒体蛋白编码基因中的特定胞嘧啶 (C) 被转化为尿嘧啶 (U)。尽管最近对 RNA 编辑进行了广泛的研究,但有效地在基因组范围内探索所有 C 到 U 的编辑事件仍然具有挑战性。开发用于检测 RNA 编辑位点的准确预测方法将极大地减少实验测定。因此,我们提出了一种新的方法,即 iPReditor-CMG(用于作物线粒体基因组的改进预测 RNA 编辑器),使用基因组序列和优化的支持向量机 (SVM) 来预测作物线粒体编辑位点。我们首先从 NCBI 发布的拟南芥、油菜和水稻中选择了三个具有已知 RNA 编辑位点的线粒体基因组作为训练和测试集。从自测序的烟草线粒体 ATPase 中选择基因及其转录本作为验证集。iPReditor-CMG 首先将基因组序列编码为数值向量,然后在高维特征空间中进行有效的特征选择,其中 SVM 用于特征选择和后续建模。三个物种内编辑位点的独立预测平均准确率为 0.85,在拟南芥中高达 0.91,优于参考模型。对于种间独立预测,双子叶植物之间的预测准确率为 0.78,双子叶植物和单子叶植物之间的准确率为 0.56,这意味着亲缘关系较近的物种中可能存在 C 到 U 编辑机制的相似性。最后,使用独立测试的最佳模型准确率为 0.91,AUC 为 0.88,这表明五个未报告的特征序列,即 TGACA、ACAAC、GTAGA、CCGTT 和 TAACA,与编辑现象密切相关。多项测试支持 iPReditor-CMG 可有效应用于预测作物线粒体中的编辑位点,这可能有助于进一步了解作物线粒体中位点编辑和转录后事件的机制。