• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PCirc:基于随机森林的植物 circRNA 鉴定软件。

PCirc: random forest-based plant circRNA identification software.

机构信息

National Engineering Laboratory for Resource Development of Endangered Crude Drugs in Northwest China, The Key Laboratory of Medicinal Resources and Natural Pharmaceutical Chemistry, The Ministry of Education, College of Life Sciences, Shaanxi Normal University, Xi'an, 710119, Shaanxi, People's Republic of China.

出版信息

BMC Bioinformatics. 2021 Jan 6;22(1):10. doi: 10.1186/s12859-020-03944-1.

DOI:10.1186/s12859-020-03944-1
PMID:33407069
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7789375/
Abstract

BACKGROUND

Circular RNA (circRNA) is a novel type of RNA with a closed-loop structure. Increasing numbers of circRNAs are being identified in plants and animals, and recent studies have shown that circRNAs play an important role in gene regulation. Therefore, identifying circRNAs from increasing amounts of RNA-seq data is very important. However, traditional circRNA recognition methods have limitations. In recent years, emerging machine learning techniques have provided a good approach for the identification of circRNAs in animals. However, using these features to identify plant circRNAs is infeasible because the characteristics of plant circRNA sequences are different from those of animal circRNAs. For example, plants are extremely rich in splicing signals and transposable elements, and their sequence conservation in rice, for example is far less than that in mammals. To solve these problems and better identify circRNAs in plants, it is urgent to develop circRNA recognition software using machine learning based on the characteristics of plant circRNAs.

RESULTS

In this study, we built a software program named PCirc using a machine learning method to predict plant circRNAs from RNA-seq data. First, we extracted different features, including open reading frames, numbers of k-mers, and splicing junction sequence coding, from rice circRNA and lncRNA data. Second, we trained a machine learning model by the random forest algorithm with tenfold cross-validation in the training set. Third, we evaluated our classification according to accuracy, precision, and F1 score, and all scores on the model test data were above 0.99. Fourth, we tested our model by other plant tests, and obtained good results, with accuracy scores above 0.8. Finally, we packaged the machine learning model built and the programming script used into a locally run circular RNA prediction software, Pcirc ( https://github.com/Lilab-SNNU/Pcirc ).

CONCLUSION

Based on rice circRNA and lncRNA data, a machine learning model for plant circRNA recognition was constructed in this study using random forest algorithm, and the model can also be applied to plant circRNA recognition such as Arabidopsis thaliana and maize. At the same time, after the completion of model construction, the machine learning model constructed and the programming scripts used in this study are packaged into a localized circRNA prediction software Pcirc, which is convenient for plant circRNA researchers to use.

摘要

背景

环状 RNA(circRNA)是一种具有闭合环结构的新型 RNA。越来越多的 circRNA 在动植物中被发现,最近的研究表明 circRNA 在基因调控中发挥着重要作用。因此,从越来越多的 RNA-seq 数据中识别 circRNA 非常重要。然而,传统的 circRNA 识别方法存在局限性。近年来,新兴的机器学习技术为识别动物中的 circRNA 提供了一种很好的方法。然而,使用这些特征来识别植物 circRNA 是不可行的,因为植物 circRNA 序列的特征与动物 circRNA 不同。例如,植物中富含剪接信号和转座元件,例如,水稻的序列保守性远低于哺乳动物。为了解决这些问题,更好地识别植物中的 circRNA,迫切需要开发基于植物 circRNA 特征的机器学习的 circRNA 识别软件。

结果

本研究使用机器学习方法构建了一个名为 PCirc 的软件程序,用于从 RNA-seq 数据中预测植物 circRNA。首先,我们从水稻 circRNA 和 lncRNA 数据中提取了不同的特征,包括开放阅读框、k-mer 数量和剪接连接序列编码。其次,我们使用随机森林算法在训练集中进行了十折交叉验证的机器学习模型训练。第三,我们根据准确性、精度和 F1 分数对我们的分类进行了评估,模型测试数据的所有分数均高于 0.99。第四,我们使用其他植物测试来测试我们的模型,获得了良好的结果,准确率得分均高于 0.8。最后,我们将构建的机器学习模型和使用的编程脚本打包成一个本地运行的环状 RNA 预测软件 Pcirc(https://github.com/Lilab-SNNU/Pcirc)。

结论

本研究基于水稻 circRNA 和 lncRNA 数据,使用随机森林算法构建了一个植物 circRNA 识别的机器学习模型,该模型也可以应用于拟南芥和玉米等植物的 circRNA 识别。同时,在模型构建完成后,我们将本研究中构建的机器学习模型和使用的编程脚本打包成一个本地化的 circRNA 预测软件 Pcirc,方便植物 circRNA 研究人员使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a69/7789375/089d74fdd51b/12859_2020_3944_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a69/7789375/14036a567fc6/12859_2020_3944_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a69/7789375/a3d898f28dda/12859_2020_3944_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a69/7789375/91ed23578b74/12859_2020_3944_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a69/7789375/a71ba97e14a5/12859_2020_3944_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a69/7789375/e12a26b430b6/12859_2020_3944_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a69/7789375/089d74fdd51b/12859_2020_3944_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a69/7789375/14036a567fc6/12859_2020_3944_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a69/7789375/a3d898f28dda/12859_2020_3944_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a69/7789375/91ed23578b74/12859_2020_3944_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a69/7789375/a71ba97e14a5/12859_2020_3944_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a69/7789375/e12a26b430b6/12859_2020_3944_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a69/7789375/089d74fdd51b/12859_2020_3944_Fig6_HTML.jpg

相似文献

1
PCirc: random forest-based plant circRNA identification software.PCirc:基于随机森林的植物 circRNA 鉴定软件。
BMC Bioinformatics. 2021 Jan 6;22(1):10. doi: 10.1186/s12859-020-03944-1.
2
circRNA-binding protein site prediction based on multi-view deep learning, subspace learning and multi-view classifier.基于多视图深度学习、子空间学习和多视图分类器的 circRNA 结合蛋白位点预测。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab394.
3
Computational Prediction of Human Disease- Associated circRNAs Based on Manifold Regularization Learning Framework.基于流形正则化学习框架的人类疾病相关 circRNAs 的计算预测。
IEEE J Biomed Health Inform. 2019 Nov;23(6):2661-2669. doi: 10.1109/JBHI.2019.2891779. Epub 2019 Jan 9.
4
Widespread noncoding circular RNAs in plants.植物中广泛存在的非编码环状RNA。
New Phytol. 2015 Oct;208(1):88-95. doi: 10.1111/nph.13585. Epub 2015 Jul 22.
5
A machine learning framework for accurately recognizing circular RNAs for clinical decision-supporting.用于临床决策支持的准确识别环状 RNA 的机器学习框架。
BMC Med Inform Decis Mak. 2020 Jul 9;20(Suppl 3):137. doi: 10.1186/s12911-020-1117-0.
6
Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning.通过机器学习识别环状RNA并预测其调控相互作用
Front Genet. 2020 Jul 21;11:655. doi: 10.3389/fgene.2020.00655. eCollection 2020.
7
GBDTCDA: Predicting circRNA-disease Associations Based on Gradient Boosting Decision Tree with Multiple Biological Data Fusion.基于多生物数据融合的梯度提升决策树的 circRNA-疾病关联预测(GBDTCDA)
Int J Biol Sci. 2019 Nov 8;15(13):2911-2924. doi: 10.7150/ijbs.33806. eCollection 2019.
8
circMeta: a unified computational framework for genomic feature annotation and differential expression analysis of circular RNAs.circMeta:一个用于环状 RNA 基因组特征注释和差异表达分析的统一计算框架。
Bioinformatics. 2020 Jan 15;36(2):539-545. doi: 10.1093/bioinformatics/btz606.
9
Collaborative deep learning improves disease-related circRNA prediction based on multi-source functional information.基于多源功能信息的协作深度学习改进疾病相关环状RNA预测
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad069.
10
Biogenesis mechanisms of circular RNA can be categorized through feature extraction of a machine learning model.环状 RNA 的生物发生机制可以通过机器学习模型的特征提取进行分类。
Bioinformatics. 2019 Dec 1;35(23):4867-4870. doi: 10.1093/bioinformatics/btz705.

引用本文的文献

1
Computational approaches and challenges in the analysis of circRNA data.环状 RNA 数据分析中的计算方法及挑战。
BMC Genomics. 2024 May 28;25(1):527. doi: 10.1186/s12864-024-10420-0.
2
CircRNA identification and feature interpretability analysis.环状 RNA 鉴定和特征可解释性分析。
BMC Biol. 2024 Feb 27;22(1):44. doi: 10.1186/s12915-023-01804-x.
3
A computational model of circRNA-associated diseases based on a graph neural network: prediction and case studies for follow-up experimental validation.基于图神经网络的环状 RNA 相关疾病计算模型:预测和案例研究,以进行后续实验验证。

本文引用的文献

1
Constitutive Expression of miR408 Improves Biomass and Seed Yield in Arabidopsis.miR408的组成型表达提高了拟南芥的生物量和种子产量。
Front Plant Sci. 2018 Jan 25;8:2114. doi: 10.3389/fpls.2017.02114. eCollection 2017.
BMC Biol. 2024 Jan 29;22(1):24. doi: 10.1186/s12915-024-01826-z.
4
Identification, characterization and expression analysis of circRNA encoded by SARS-CoV-1 and SARS-CoV-2.严重急性呼吸综合征冠状病毒1型(SARS-CoV-1)和严重急性呼吸综合征冠状病毒2型(SARS-CoV-2)编码的环状RNA的鉴定、特征分析及表达分析
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbad537.
5
New insight into circRNAs: characterization, strategies, and biomedical applications.环状RNA的新见解:特征、研究策略及生物医学应用
Exp Hematol Oncol. 2023 Oct 12;12(1):91. doi: 10.1186/s40164-023-00451-w.
6
CircPCBL: Identification of Plant CircRNAs with a CNN-BiGRU-GLT Model.环状PCBL:使用CNN-BiGRU-GLT模型鉴定植物环状RNA
Plants (Basel). 2023 Apr 14;12(8):1652. doi: 10.3390/plants12081652.
7
PseU-ST: A new stacked ensemble-learning method for identifying RNA pseudouridine sites.PseU-ST:一种用于识别RNA假尿苷位点的新型堆叠集成学习方法。
Front Genet. 2023 Jan 19;14:1121694. doi: 10.3389/fgene.2023.1121694. eCollection 2023.
8
Identification, biogenesis, function, and mechanism of action of circular RNAs in plants.环状 RNA 在植物中的鉴定、生物发生、功能和作用机制。
Plant Commun. 2023 Jan 9;4(1):100430. doi: 10.1016/j.xplc.2022.100430. Epub 2022 Sep 7.
9
Evaluation of CircRNA Sequence Assembly Methods Using Long Reads.使用长读长评估环状RNA序列组装方法
Front Genet. 2022 Feb 14;13:816825. doi: 10.3389/fgene.2022.816825. eCollection 2022.
10
Advances in Non-Coding RNA Sequencing.非编码RNA测序进展
Noncoding RNA. 2021 Oct 30;7(4):70. doi: 10.3390/ncrna7040070.