Suppr超能文献

RBP-TSTL 是一种用于 RNA 结合蛋白全基因组预测的两阶段迁移学习框架。

RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins.

机构信息

Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia.

Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia.

出版信息

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac215.

Abstract

RNA binding proteins (RBPs) are critical for the post-transcriptional control of RNAs and play vital roles in a myriad of biological processes, such as RNA localization and gene regulation. Therefore, computational methods that are capable of accurately identifying RBPs are highly desirable and have important implications for biomedical and biotechnological applications. Here, we propose a two-stage deep transfer learning-based framework, termed RBP-TSTL, for accurate prediction of RBPs. In the first stage, the knowledge from the self-supervised pre-trained model was extracted as feature embeddings and used to represent the protein sequences, while in the second stage, a customized deep learning model was initialized based on an annotated pre-training RBPs dataset before being fine-tuned on each corresponding target species dataset. This two-stage transfer learning framework can enable the RBP-TSTL model to be effectively trained to learn and improve the prediction performance. Extensive performance benchmarking of the RBP-TSTL models trained using the features generated by the self-supervised pre-trained model and other models trained using hand-crafting encoding features demonstrated the effectiveness of the proposed two-stage knowledge transfer strategy based on the self-supervised pre-trained models. Using the best-performing RBP-TSTL models, we further conducted genome-scale RBP predictions for Homo sapiens, Arabidopsis thaliana, Escherichia coli, and Salmonella and established a computational compendium containing all the predicted putative RBPs candidates. We anticipate that the proposed RBP-TSTL approach will be explored as a useful tool for the characterization of RNA-binding proteins and exploration of their sequence-structure-function relationships.

摘要

RNA 结合蛋白 (RBPs) 是 RNA 转录后调控的关键分子,在许多生物过程中发挥着至关重要的作用,如 RNA 定位和基因调控。因此,能够准确识别 RBPs 的计算方法是非常需要的,并且对生物医学和生物技术应用具有重要意义。在这里,我们提出了一种两阶段深度迁移学习框架,称为 RBP-TSTL,用于准确预测 RBPs。在第一阶段,从自监督预训练模型中提取知识作为特征嵌入,并用于表示蛋白质序列,而在第二阶段,基于注释的预训练 RBPs 数据集初始化定制的深度学习模型,然后在每个相应的目标物种数据集上进行微调。这种两阶段迁移学习框架可以使 RBP-TSTL 模型有效地进行训练,以学习和提高预测性能。使用自监督预训练模型生成的特征和使用手工制作编码特征训练的其他模型训练的 RBP-TSTL 模型的广泛性能基准测试表明了基于自监督预训练模型的两阶段知识转移策略的有效性。使用表现最佳的 RBP-TSTL 模型,我们进一步对智人、拟南芥、大肠杆菌和沙门氏菌进行了全基因组规模的 RBP 预测,并建立了一个包含所有预测的潜在 RBPs 候选物的计算纲要。我们预计,所提出的 RBP-TSTL 方法将被探索为一种有用的工具,用于研究 RNA 结合蛋白的特性及其序列-结构-功能关系。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验