Suppr超能文献

基于多视图深度学习、子空间学习和多视图分类器的 circRNA 结合蛋白位点预测。

circRNA-binding protein site prediction based on multi-view deep learning, subspace learning and multi-view classifier.

机构信息

Jiangnan University, Wuxi, Jiangsu 214012, China.

School of Artificial Intelligence and Computer Science of Jiangnan University, Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (LCNBI) and ZJLab, Wuxi, Jiangsu 214012, China.

出版信息

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab394.

Abstract

Circular RNAs (circRNAs) generally bind to RNA-binding proteins (RBPs) to play an important role in the regulation of autoimmune diseases. Thus, it is crucial to study the binding sites of RBPs on circRNAs. Although many methods, including traditional machine learning and deep learning, have been developed to predict the interactions between RNAs and RBPs, and most of them are focused on linear RNAs. At present, few studies have been done on the binding relationships between circRNAs and RBPs. Thus, in-depth research is urgently needed. In the existing circRNA-RBP binding site prediction methods, circRNA sequences are the main research subjects, but the relevant characteristics of circRNAs have not been fully exploited, such as the structure and composition information of circRNA sequences. Some methods have extracted different views to construct recognition models, but how to efficiently use the multi-view data to construct recognition models is still not well studied. Considering the above problems, this paper proposes a multi-view classification method called DMSK based on multi-view deep learning, subspace learning and multi-view classifier for the identification of circRNA-RBP interaction sites. In the DMSK method, first, we converted circRNA sequences into pseudo-amino acid sequences and pseudo-dipeptide components for extracting high-dimensional sequence features and component features of circRNAs, respectively. Then, the structure prediction method RNAfold was used to predict the secondary structure of the RNA sequences, and the sequence embedding model was used to extract the context-dependent features. Next, we fed the above four views' raw features to a hybrid network, which is composed of a convolutional neural network and a long short-term memory network, to obtain the deep features of circRNAs. Furthermore, we used view-weighted generalized canonical correlation analysis to extract four views' common features by subspace learning. Finally, the learned subspace common features and multi-view deep features were fed to train the downstream multi-view TSK fuzzy system to construct a fuzzy rule and fuzzy inference-based multi-view classifier. The trained classifier was used to predict the specific positions of the RBP binding sites on the circRNAs. The experiments show that the prediction performance of the proposed method DMSK has been improved compared with the existing methods. The code and dataset of this study are available at https://github.com/Rebecca3150/DMSK.

摘要

环状 RNA(circRNAs)通常与 RNA 结合蛋白(RBPs)结合,在自身免疫性疾病的调控中发挥重要作用。因此,研究 RBPs 与 circRNAs 的结合位点至关重要。虽然已经开发了许多方法,包括传统机器学习和深度学习,来预测 RNA 和 RBPs 之间的相互作用,但大多数方法都集中在线性 RNA 上。目前,关于 circRNAs 和 RBPs 之间的结合关系的研究较少。因此,迫切需要深入研究。在现有的 circRNA-RBP 结合位点预测方法中,circRNA 序列是主要的研究对象,但 circRNA 序列的相关特征尚未得到充分利用,例如 circRNA 序列的结构和组成信息。一些方法提取了不同的视图来构建识别模型,但如何有效地利用多视图数据构建识别模型还没有得到很好的研究。考虑到上述问题,本文提出了一种基于多视图深度学习、子空间学习和多视图分类器的多视图分类方法 DMSK,用于识别 circRNA-RBP 相互作用位点。在 DMSK 方法中,首先将 circRNA 序列转换为伪氨基酸序列和伪二肽成分,分别提取 circRNA 的高维序列特征和成分特征。然后,使用 RNAfold 结构预测方法预测 RNA 序列的二级结构,并使用序列嵌入模型提取上下文相关特征。接下来,将上述四个视图的原始特征输入到由卷积神经网络和长短时记忆网络组成的混合网络中,获取 circRNAs 的深度特征。此外,使用视图加权广义典型相关分析从子空间学习中提取四个视图的公共特征。最后,将学习到的子空间公共特征和多视图深度特征输入到下游多视图 TSK 模糊系统中,构建基于模糊规则和模糊推理的多视图分类器。使用训练好的分类器来预测 RBP 在 circRNAs 上的结合位点的具体位置。实验表明,与现有方法相比,所提出的方法 DMSK 的预测性能得到了提高。本研究的代码和数据集可在 https://github.com/Rebecca3150/DMSK 上获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验