• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

RBP-TSTL 是一种用于 RNA 结合蛋白全基因组预测的两阶段迁移学习框架。

RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins.

机构信息

Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.

Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia.

出版信息

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac215.

DOI:10.1093/bib/bbac215
PMID:35649392
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9294422/
Abstract

RNA binding proteins (RBPs) are critical for the post-transcriptional control of RNAs and play vital roles in a myriad of biological processes, such as RNA localization and gene regulation. Therefore, computational methods that are capable of accurately identifying RBPs are highly desirable and have important implications for biomedical and biotechnological applications. Here, we propose a two-stage deep transfer learning-based framework, termed RBP-TSTL, for accurate prediction of RBPs. In the first stage, the knowledge from the self-supervised pre-trained model was extracted as feature embeddings and used to represent the protein sequences, while in the second stage, a customized deep learning model was initialized based on an annotated pre-training RBPs dataset before being fine-tuned on each corresponding target species dataset. This two-stage transfer learning framework can enable the RBP-TSTL model to be effectively trained to learn and improve the prediction performance. Extensive performance benchmarking of the RBP-TSTL models trained using the features generated by the self-supervised pre-trained model and other models trained using hand-crafting encoding features demonstrated the effectiveness of the proposed two-stage knowledge transfer strategy based on the self-supervised pre-trained models. Using the best-performing RBP-TSTL models, we further conducted genome-scale RBP predictions for Homo sapiens, Arabidopsis thaliana, Escherichia coli, and Salmonella and established a computational compendium containing all the predicted putative RBPs candidates. We anticipate that the proposed RBP-TSTL approach will be explored as a useful tool for the characterization of RNA-binding proteins and exploration of their sequence-structure-function relationships.

摘要

RNA 结合蛋白 (RBPs) 是 RNA 转录后调控的关键分子,在许多生物过程中发挥着至关重要的作用,如 RNA 定位和基因调控。因此,能够准确识别 RBPs 的计算方法是非常需要的,并且对生物医学和生物技术应用具有重要意义。在这里,我们提出了一种两阶段深度迁移学习框架,称为 RBP-TSTL,用于准确预测 RBPs。在第一阶段,从自监督预训练模型中提取知识作为特征嵌入,并用于表示蛋白质序列,而在第二阶段,基于注释的预训练 RBPs 数据集初始化定制的深度学习模型,然后在每个相应的目标物种数据集上进行微调。这种两阶段迁移学习框架可以使 RBP-TSTL 模型有效地进行训练,以学习和提高预测性能。使用自监督预训练模型生成的特征和使用手工制作编码特征训练的其他模型训练的 RBP-TSTL 模型的广泛性能基准测试表明了基于自监督预训练模型的两阶段知识转移策略的有效性。使用表现最佳的 RBP-TSTL 模型,我们进一步对智人、拟南芥、大肠杆菌和沙门氏菌进行了全基因组规模的 RBP 预测,并建立了一个包含所有预测的潜在 RBPs 候选物的计算纲要。我们预计,所提出的 RBP-TSTL 方法将被探索为一种有用的工具,用于研究 RNA 结合蛋白的特性及其序列-结构-功能关系。

相似文献

1
RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins.RBP-TSTL 是一种用于 RNA 结合蛋白全基因组预测的两阶段迁移学习框架。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac215.
2
PreRBP-TL: prediction of species-specific RNA-binding proteins based on transfer learning.PreRBP-TL:基于迁移学习的物种特异性 RNA 结合蛋白预测。
Bioinformatics. 2022 Apr 12;38(8):2135-2143. doi: 10.1093/bioinformatics/btac106.
3
RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach.基于新型混合深度学习跨域知识整合方法的RNA-蛋白质结合基序挖掘
BMC Bioinformatics. 2017 Feb 28;18(1):136. doi: 10.1186/s12859-017-1561-8.
4
RNA-binding protein recognition based on multi-view deep feature and multi-label learning.基于多视图深度特征和多标签学习的 RNA 结合蛋白识别。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa174.
5
Integrating thermodynamic and sequence contexts improves protein-RNA binding prediction.整合热力学和序列背景可提高蛋白质-RNA 结合预测。
PLoS Comput Biol. 2019 Sep 4;15(9):e1007283. doi: 10.1371/journal.pcbi.1007283. eCollection 2019 Sep.
6
RBPLight: a computational tool for discovery of plant-specific RNA-binding proteins using light gradient boosting machine and ensemble of evolutionary features.RBPLight:一种使用光梯度提升机和进化特征集的计算工具,用于发现植物特异性 RNA 结合蛋白。
Brief Funct Genomics. 2023 Nov 10;22(5):401-410. doi: 10.1093/bfgp/elad016.
7
circRNA-binding protein site prediction based on multi-view deep learning, subspace learning and multi-view classifier.基于多视图深度学习、子空间学习和多视图分类器的 circRNA 结合蛋白位点预测。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab394.
8
A deep learning framework for modeling structural features of RNA-binding protein targets.一种用于对RNA结合蛋白靶点的结构特征进行建模的深度学习框架。
Nucleic Acids Res. 2016 Feb 29;44(4):e32. doi: 10.1093/nar/gkv1025. Epub 2015 Oct 13.
9
Recognizing binding sites of poorly characterized RNA-binding proteins on circular RNAs using attention Siamese network.利用注意力暹罗网络识别环状 RNA 上特征不明显的 RNA 结合蛋白的结合位点。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab279.
10
Human protein-RNA interaction network is highly stable across mammals.人类蛋白质-RNA 相互作用网络在哺乳动物中高度稳定。
BMC Genomics. 2019 Dec 30;20(Suppl 12):1004. doi: 10.1186/s12864-019-6330-9.

引用本文的文献

1
Advancing microRNA target site prediction with transformer and base-pairing patterns.利用转换器和碱基配对模式提高 microRNA 靶位预测。
Nucleic Acids Res. 2024 Oct 28;52(19):11455-11465. doi: 10.1093/nar/gkae782.
2
PRONTO-TK: a user-friendly PROtein Neural neTwOrk tool-kit for accessible protein function prediction.PRONTO-TK:一款用户友好型蛋白质神经网络工具包,用于便捷的蛋白质功能预测。
NAR Genom Bioinform. 2024 Aug 27;6(3):lqae112. doi: 10.1093/nargab/lqae112. eCollection 2024 Sep.
3
Big data and deep learning for RNA biology.

本文引用的文献

1
POSREG: proteomic signature discovered by simultaneously optimizing its reproducibility and generalizability.POSREG:通过同时优化其再现性和通用性发现的蛋白质组学特征。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac040.
2
PreRBP-TL: prediction of species-specific RNA-binding proteins based on transfer learning.PreRBP-TL:基于迁移学习的物种特异性 RNA 结合蛋白预测。
Bioinformatics. 2022 Apr 12;38(8):2135-2143. doi: 10.1093/bioinformatics/btac106.
3
Optimization of metabolomic data processing using NOREVA.使用 NOREVA 优化代谢组学数据处理。
大数据和深度学习在 RNA 生物学中的应用。
Exp Mol Med. 2024 Jun;56(6):1293-1321. doi: 10.1038/s12276-024-01243-w. Epub 2024 Jun 14.
4
msBERT-Promoter: a multi-scale ensemble predictor based on BERT pre-trained model for the two-stage prediction of DNA promoters and their strengths.msBERT-Promoter:一种基于 BERT 预训练模型的多尺度集成预测器,用于 DNA 启动子及其强度的两阶段预测。
BMC Biol. 2024 May 30;22(1):126. doi: 10.1186/s12915-024-01923-z.
5
Deep Learning for Elucidating Modifications to RNA-Status and Challenges Ahead.深度学习解析 RNA 状态修饰及其面临的挑战。
Genes (Basel). 2024 May 15;15(5):629. doi: 10.3390/genes15050629.
6
A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond.蛋白质中心预测因子在生物分子相互作用研究中的综合综述:从蛋白质到核酸及其他。
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae162.
7
RBProkCNN: Deep learning on appropriate contextual evolutionary information for RNA binding protein discovery in prokaryotes.RBProkCNN:基于适当上下文进化信息的深度学习用于原核生物中RNA结合蛋白的发现
Comput Struct Biotechnol J. 2024 Apr 15;23:1631-1640. doi: 10.1016/j.csbj.2024.04.034. eCollection 2024 Dec.
8
EACVP: An ESM-2 LM Framework Combined CNN and CBAM Attention to Predict Anti-coronavirus Peptides.EACVP:一种结合卷积神经网络(CNN)和CBAM注意力机制的ESM-2语言模型框架,用于预测抗冠状病毒肽。
Curr Med Chem. 2025;32(10):2040-2054. doi: 10.2174/0109298673287899240303164403.
Nat Protoc. 2022 Jan;17(1):129-151. doi: 10.1038/s41596-021-00636-9. Epub 2021 Dec 24.
4
Biologically relevant transfer learning improves transcription factor binding prediction.生物相关的迁移学习可提高转录因子结合预测。
Genome Biol. 2021 Sep 27;22(1):280. doi: 10.1186/s13059-021-02499-5.
5
Recognizing binding sites of poorly characterized RNA-binding proteins on circular RNAs using attention Siamese network.利用注意力暹罗网络识别环状 RNA 上特征不明显的 RNA 结合蛋白的结合位点。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab279.
6
ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans:通过自监督学习理解生命语言。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.
7
Learning the protein language: Evolution, structure, and function.学习蛋白质语言:进化、结构和功能。
Cell Syst. 2021 Jun 16;12(6):654-669.e3. doi: 10.1016/j.cels.2021.05.017.
8
RNA Binding Protein-Based Model for Prognostic Prediction of Colorectal Cancer.基于 RNA 结合蛋白的结直肠癌预后预测模型。
Technol Cancer Res Treat. 2021 Jan-Dec;20:15330338211019504. doi: 10.1177/15330338211019504.
9
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
10
SAResNet: self-attention residual network for predicting DNA-protein binding.SAResNet:用于预测 DNA-蛋白质结合的自注意力残差网络。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab101.