• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于机器学习实现的不平衡序列的新型核糖体开关分类。

A novel riboswitch classification based on imbalanced sequences achieved by machine learning.

机构信息

Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou, China.

School of Medicine, Zhejiang University, Hangzhou, Zhejiang, China.

出版信息

PLoS Comput Biol. 2020 Jul 20;16(7):e1007760. doi: 10.1371/journal.pcbi.1007760. eCollection 2020 Jul.

DOI:10.1371/journal.pcbi.1007760
PMID:32687488
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7392346/
Abstract

Riboswitch, a part of regulatory mRNA (50-250nt in length), has two main classes: aptamer and expression platform. One of the main challenges raised during the classification of riboswitch is imbalanced data. That is a circumstance in which the records of a sequences of one group are very small compared to the others. Such circumstances lead classifier to ignore minority group and emphasize on majority ones, which results in a skewed classification. We considered sixteen riboswitch families, to be in accord with recent riboswitch classification work, that contain imbalanced sequences. The sequences were split into training and test set using a newly developed pipeline. From 5460 k-mers (k value 1 to 6) produced, 156 features were calculated based on CfsSubsetEval and BestFirst function found in WEKA 3.8. Statistically tested result was significantly difference between balanced and imbalanced sequences (p < 0.05). Besides, each algorithm also showed a significant difference in sensitivity, specificity, accuracy, and macro F-score when used in both groups (p < 0.05). Several k-mers clustered from heat map were discovered to have biological functions and motifs at the different positions like interior loops, terminal loops and helices. They were validated to have a biological function and some are riboswitch motifs. The analysis has discovered the importance of solving the challenges of majority bias analysis and overfitting. Presented results were generalized evaluation of both balanced and imbalanced models, which implies their ability of classifying, to classify novel riboswitches. The Python source code is available at https://github.com/Seasonsling/riboswitch.

摘要

核糖开关是调节 mRNA(50-250nt 长)的一部分,主要有两个类别:适体和表达平台。在核糖开关分类过程中面临的主要挑战之一是数据不平衡。这种情况是指与其他组相比,一个组的序列记录非常小。这种情况会导致分类器忽略少数群体并强调多数群体,从而导致分类倾斜。我们考虑了十六个核糖开关家族,这些家族与最近的核糖开关分类工作一致,包含不平衡的序列。使用新开发的管道将序列分为训练集和测试集。从生成的 5460 个 k-mer(k 值为 1 到 6)中,根据 WEKA 3.8 中的 CfsSubsetEval 和 BestFirst 函数计算了 156 个特征。经过统计检验,平衡和不平衡序列之间的结果有显著差异(p<0.05)。此外,当在这两组中使用时,每种算法在敏感性、特异性、准确性和宏 F 分数方面也显示出显著差异(p<0.05)。从热图聚类的几个 k-mer 被发现具有不同位置的生物学功能和基序,如内部环、末端环和螺旋。它们被验证具有生物学功能,有些是核糖开关基序。该分析发现了解决多数偏见分析和过拟合挑战的重要性。所呈现的结果是对平衡和不平衡模型的综合评估,这意味着它们具有分类的能力,可以对新型核糖开关进行分类。Python 源代码可在 https://github.com/Seasonsling/riboswitch 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611a/7392346/e59691f9839e/pcbi.1007760.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611a/7392346/43db5cddf99a/pcbi.1007760.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611a/7392346/a8132fbe0f6f/pcbi.1007760.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611a/7392346/0b7e8e4fb3bf/pcbi.1007760.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611a/7392346/caa850a7a5f3/pcbi.1007760.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611a/7392346/01fde9c194a7/pcbi.1007760.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611a/7392346/3fa4e1ed10da/pcbi.1007760.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611a/7392346/e59691f9839e/pcbi.1007760.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611a/7392346/43db5cddf99a/pcbi.1007760.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611a/7392346/a8132fbe0f6f/pcbi.1007760.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611a/7392346/0b7e8e4fb3bf/pcbi.1007760.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611a/7392346/caa850a7a5f3/pcbi.1007760.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611a/7392346/01fde9c194a7/pcbi.1007760.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611a/7392346/3fa4e1ed10da/pcbi.1007760.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/611a/7392346/e59691f9839e/pcbi.1007760.g007.jpg

相似文献

1
A novel riboswitch classification based on imbalanced sequences achieved by machine learning.基于机器学习实现的不平衡序列的新型核糖体开关分类。
PLoS Comput Biol. 2020 Jul 20;16(7):e1007760. doi: 10.1371/journal.pcbi.1007760. eCollection 2020 Jul.
2
Generative Modeling of RNA Sequence Families with Restricted Boltzmann Machines.用受限玻尔兹曼机对 RNA 序列家族进行生成建模。
Methods Mol Biol. 2025;2847:163-175. doi: 10.1007/978-1-0716-4079-1_11.
3
Tuning the Performance of Synthetic Riboswitches using Machine Learning.利用机器学习优化合成核糖开关的性能
ACS Synth Biol. 2019 Jan 18;8(1):34-44. doi: 10.1021/acssynbio.8b00207. Epub 2019 Jan 8.
4
Partial RNA design.部分 RNA 设计。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i437-i445. doi: 10.1093/bioinformatics/btae222.
5
Classification of riboswitch sequences using k-mer frequencies.使用k-mer频率对核糖开关序列进行分类。
Biosystems. 2018 Dec;174:63-76. doi: 10.1016/j.biosystems.2018.09.001. Epub 2018 Sep 8.
6
Computational prediction of riboswitches.核糖开关的计算预测
Methods Enzymol. 2015;553:287-312. doi: 10.1016/bs.mie.2014.10.063. Epub 2015 Feb 19.
7
Secondary structural entropy in RNA switch (Riboswitch) identification.RNA开关(核糖开关)识别中的二级结构熵
BMC Bioinformatics. 2015 Apr 28;16:133. doi: 10.1186/s12859-015-0523-2.
8
Finding consensus stable local optimal structures for aligned RNA sequences and its application to discovering riboswitch elements.寻找比对RNA序列的共识稳定局部最优结构及其在发现核糖开关元件中的应用。
Int J Bioinform Res Appl. 2014;10(4-5):498-518. doi: 10.1504/IJBRA.2014.062997.
9
Application of supervised machine learning algorithms for the classification of regulatory RNA riboswitches.监督式机器学习算法在调控RNA核糖开关分类中的应用。
Brief Funct Genomics. 2017 Mar 1;16(2):99-105. doi: 10.1093/bfgp/elw005.
10
Development of a new oligonucleotide block location-based feature extraction (BLBFE) method for the classification of riboswitches.开发一种基于寡核苷酸模块位置的新特征提取(BLBFE)方法,用于核糖开关的分类。
Mol Genet Genomics. 2020 Mar;295(2):525-534. doi: 10.1007/s00438-019-01642-z. Epub 2020 Jan 4.

引用本文的文献

1
RR3DD: an RNA global structure-based RNA three-dimensional structural classification database.RR3DD:一个基于 RNA 全局结构的 RNA 三维结构分类数据库。
RNA Biol. 2021 Nov 12;18(sup2):738-746. doi: 10.1080/15476286.2021.1989200. Epub 2021 Oct 18.
2
In Vivo Production of RNA Aptamers and Nanoparticles: Problems and Prospects.在体产生 RNA 适体和纳米颗粒:问题与展望。
Molecules. 2021 Mar 6;26(5):1422. doi: 10.3390/molecules26051422.

本文引用的文献

1
Predicting bacterial virulence factors - evaluation of machine learning and negative data strategies.预测细菌毒力因子 - 机器学习和负数据策略的评估。
Brief Bioinform. 2020 Sep 25;21(5):1596-1608. doi: 10.1093/bib/bbz076.
2
A newborn RNA switches its fate.新生 RNA 决定其命运。
Nat Chem Biol. 2019 Nov;15(11):1031-1032. doi: 10.1038/s41589-019-0391-6.
3
SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting.SubMito-XGBoost:通过融合多种特征信息和极端梯度提升预测蛋白质亚线粒体定位。
Bioinformatics. 2020 Feb 15;36(4):1074-1081. doi: 10.1093/bioinformatics/btz734.
4
Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning.基于序列的深度学习同时提高稳定性、准确性和假阳性率的蛋白质功能注释。
Brief Bioinform. 2020 Jul 15;21(4):1437-1447. doi: 10.1093/bib/bbz081.
5
A critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation.五种基于机器学习的算法预测蛋白质突变稳定性变化的综述。
Brief Bioinform. 2020 Jul 15;21(4):1285-1292. doi: 10.1093/bib/bbz071.
6
Using Rosetta for RNA homology modeling.使用Rosetta进行RNA同源建模。
Methods Enzymol. 2019;623:177-207. doi: 10.1016/bs.mie.2019.05.026. Epub 2019 Jun 11.
7
A comprehensive review and performance evaluation of bioinformatics tools for HLA class I peptide-binding prediction.HLA 类 I 肽结合预测的生物信息学工具的综合评价与性能评估。
Brief Bioinform. 2020 Jul 15;21(4):1119-1135. doi: 10.1093/bib/bbz051.
8
iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data.iLearn:一个集成平台和元学习者,用于 DNA、RNA 和蛋白质序列数据的特征工程、机器学习分析和建模。
Brief Bioinform. 2020 May 21;21(3):1047-1057. doi: 10.1093/bib/bbz041.
9
RiboD: a comprehensive database for prokaryotic riboswitches.RiboD:一个用于原核生物的核糖开关的综合数据库。
Bioinformatics. 2019 Sep 15;35(18):3541-3543. doi: 10.1093/bioinformatics/btz093.
10
Classification of riboswitch sequences using k-mer frequencies.使用k-mer频率对核糖开关序列进行分类。
Biosystems. 2018 Dec;174:63-76. doi: 10.1016/j.biosystems.2018.09.001. Epub 2018 Sep 8.