Suppr超能文献

SAWRPI:一种基于序列信息预测非编码RNA-蛋白质相互作用的具有自适应权重的堆叠集成框架。

SAWRPI: A Stacking Ensemble Framework With Adaptive Weight for Predicting ncRNA-Protein Interactions Using Sequence Information.

作者信息

Ren Zhong-Hao, Yu Chang-Qing, Li Li-Ping, You Zhu-Hong, Guan Yong-Jian, Li Yue-Chao, Pan Jie

机构信息

School of Information Engineering, Xijing University, Xi'an, China.

School of Computer Science, Northwestern Polytechnical University, Xi'an, China.

出版信息

Front Genet. 2022 Feb 28;13:839540. doi: 10.3389/fgene.2022.839540. eCollection 2022.

Abstract

Non-coding RNAs (ncRNAs) take essential effects on biological processes, like gene regulation. One critical way of ncRNA executing biological functions is interactions between ncRNA and RNA binding proteins (RBPs). Identifying proteins, involving ncRNA-protein interactions, can well understand the function ncRNA. Many high-throughput experiment have been applied to recognize the interactions. As a consequence of these approaches are time- and labor-consuming, currently, a great number of computational methods have been developed to improve and advance the ncRNA-protein interactions research. However, these methods may be not available to all RNAs and proteins, particularly processing new RNAs and proteins. Additionally, most of them cannot process well with long sequence. In this work, a computational method SAWRPI is proposed to make prediction of ncRNA-protein through sequence information. More specifically, the raw features of protein and ncRNA are firstly extracted through the k-mer sparse matrix with SVD reduction and learning nucleic acid symbols by natural language processing with local fusion strategy, respectively. Then, to classify easily, Hilbert Transformation is exploited to transform raw feature data to the new feature space. Finally, stacking ensemble strategy is adopted to learn high-level abstraction features automatically and generate final prediction results. To confirm the robustness and stability, three different datasets containing two kinds of interactions are utilized. In comparison with state-of-the-art methods and other results classifying or feature extracting strategies, SAWRPI achieved high performance on three datasets, containing two kinds of lncRNA-protein interactions. Upon our finding, SAWRPI is a trustworthy, robust, yet simple and can be used as a beneficial supplement to the task of predicting ncRNA-protein interactions.

摘要

非编码RNA(ncRNAs)在生物过程中发挥着重要作用,如基因调控。ncRNA执行生物学功能的一种关键方式是与RNA结合蛋白(RBPs)相互作用。识别参与ncRNA-蛋白质相互作用的蛋白质,有助于深入了解ncRNA的功能。许多高通量实验已被用于识别这些相互作用。由于这些方法既耗时又费力,目前已开发出大量计算方法来改进和推进ncRNA-蛋白质相互作用的研究。然而,这些方法可能并非适用于所有的RNA和蛋白质,尤其是处理新的RNA和蛋白质时。此外,它们中的大多数在处理长序列时效果不佳。在这项工作中,我们提出了一种计算方法SAWRPI,通过序列信息来预测ncRNA-蛋白质相互作用。具体而言,首先分别通过具有奇异值分解降维的k-mer稀疏矩阵提取蛋白质和ncRNA的原始特征,并通过采用局部融合策略的自然语言处理来学习核酸符号。然后,为了便于分类,利用希尔伯特变换将原始特征数据转换到新的特征空间。最后,采用堆叠集成策略自动学习高级抽象特征并生成最终预测结果。为了验证其稳健性和稳定性,我们使用了包含两种相互作用的三个不同数据集。与现有方法以及其他分类或特征提取策略的结果相比,SAWRPI在包含两种lncRNA-蛋白质相互作用的三个数据集上均取得了高性能。根据我们的发现,SAWRPI是一种可靠、稳健且简单的方法,可作为预测ncRNA-蛋白质相互作用任务的有益补充。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8eaf/8963817/85c310629ae2/fgene-13-839540-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验