Suppr超能文献

PCirc:基于随机森林的植物 circRNA 鉴定软件。

PCirc: random forest-based plant circRNA identification software.

机构信息

National Engineering Laboratory for Resource Development of Endangered Crude Drugs in Northwest China, The Key Laboratory of Medicinal Resources and Natural Pharmaceutical Chemistry, The Ministry of Education, College of Life Sciences, Shaanxi Normal University, Xi'an, 710119, Shaanxi, People's Republic of China.

出版信息

BMC Bioinformatics. 2021 Jan 6;22(1):10. doi: 10.1186/s12859-020-03944-1.

Abstract

BACKGROUND

Circular RNA (circRNA) is a novel type of RNA with a closed-loop structure. Increasing numbers of circRNAs are being identified in plants and animals, and recent studies have shown that circRNAs play an important role in gene regulation. Therefore, identifying circRNAs from increasing amounts of RNA-seq data is very important. However, traditional circRNA recognition methods have limitations. In recent years, emerging machine learning techniques have provided a good approach for the identification of circRNAs in animals. However, using these features to identify plant circRNAs is infeasible because the characteristics of plant circRNA sequences are different from those of animal circRNAs. For example, plants are extremely rich in splicing signals and transposable elements, and their sequence conservation in rice, for example is far less than that in mammals. To solve these problems and better identify circRNAs in plants, it is urgent to develop circRNA recognition software using machine learning based on the characteristics of plant circRNAs.

RESULTS

In this study, we built a software program named PCirc using a machine learning method to predict plant circRNAs from RNA-seq data. First, we extracted different features, including open reading frames, numbers of k-mers, and splicing junction sequence coding, from rice circRNA and lncRNA data. Second, we trained a machine learning model by the random forest algorithm with tenfold cross-validation in the training set. Third, we evaluated our classification according to accuracy, precision, and F1 score, and all scores on the model test data were above 0.99. Fourth, we tested our model by other plant tests, and obtained good results, with accuracy scores above 0.8. Finally, we packaged the machine learning model built and the programming script used into a locally run circular RNA prediction software, Pcirc ( https://github.com/Lilab-SNNU/Pcirc ).

CONCLUSION

Based on rice circRNA and lncRNA data, a machine learning model for plant circRNA recognition was constructed in this study using random forest algorithm, and the model can also be applied to plant circRNA recognition such as Arabidopsis thaliana and maize. At the same time, after the completion of model construction, the machine learning model constructed and the programming scripts used in this study are packaged into a localized circRNA prediction software Pcirc, which is convenient for plant circRNA researchers to use.

摘要

背景

环状 RNA(circRNA)是一种具有闭合环结构的新型 RNA。越来越多的 circRNA 在动植物中被发现,最近的研究表明 circRNA 在基因调控中发挥着重要作用。因此,从越来越多的 RNA-seq 数据中识别 circRNA 非常重要。然而,传统的 circRNA 识别方法存在局限性。近年来,新兴的机器学习技术为识别动物中的 circRNA 提供了一种很好的方法。然而,使用这些特征来识别植物 circRNA 是不可行的,因为植物 circRNA 序列的特征与动物 circRNA 不同。例如,植物中富含剪接信号和转座元件,例如,水稻的序列保守性远低于哺乳动物。为了解决这些问题,更好地识别植物中的 circRNA,迫切需要开发基于植物 circRNA 特征的机器学习的 circRNA 识别软件。

结果

本研究使用机器学习方法构建了一个名为 PCirc 的软件程序,用于从 RNA-seq 数据中预测植物 circRNA。首先,我们从水稻 circRNA 和 lncRNA 数据中提取了不同的特征,包括开放阅读框、k-mer 数量和剪接连接序列编码。其次,我们使用随机森林算法在训练集中进行了十折交叉验证的机器学习模型训练。第三,我们根据准确性、精度和 F1 分数对我们的分类进行了评估,模型测试数据的所有分数均高于 0.99。第四,我们使用其他植物测试来测试我们的模型,获得了良好的结果,准确率得分均高于 0.8。最后,我们将构建的机器学习模型和使用的编程脚本打包成一个本地运行的环状 RNA 预测软件 Pcirc(https://github.com/Lilab-SNNU/Pcirc)。

结论

本研究基于水稻 circRNA 和 lncRNA 数据,使用随机森林算法构建了一个植物 circRNA 识别的机器学习模型,该模型也可以应用于拟南芥和玉米等植物的 circRNA 识别。同时,在模型构建完成后,我们将本研究中构建的机器学习模型和使用的编程脚本打包成一个本地化的 circRNA 预测软件 Pcirc,方便植物 circRNA 研究人员使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a69/7789375/14036a567fc6/12859_2020_3944_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验