Suppr超能文献

RPIPCM:一种基于序列特征编码预测长链非编码RNA-蛋白质相互作用的深度网络模型。

RPIPCM: A deep network model for predicting lncRNA-protein interaction based on sequence feature encoding.

作者信息

Gong Lejun, Chen Jingmei, Cui Xiong, Liu Yang

机构信息

School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China.

School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China.

出版信息

Comput Biol Med. 2023 Oct;165:107366. doi: 10.1016/j.compbiomed.2023.107366. Epub 2023 Aug 14.

Abstract

LncRNA-protein interactionplays an important regulatory role in biological processes. In this paper, the proposed RPIPCM based on a novel deep network model uses the sequence feature encoding of both RNA and protein to predict lncRNA-protein interactions (LPIs). A negative sampling of sliding window method is proposed for solving the problem of unbalanced between positive and negative samples. The proposed negative sampling method is effective and helpful to solve the problem of data imbalance in the existing LPIs research by comparative experiments. Experimental results also show that the proposed sequence feature encoding method has good performance in predicting LPIs for different datasets of different sizes and types. In the RPI488 dataset related to animal, compared with the direct original sequence encoding model, the accuracy of sequence feature encoding model increased by 1.02%, the recall increased by 4.08%, and the value of MCC increased by 1.67%. In the case of the plant dataset ATH948, the sequence feature-based encoding demonstrated a 1.58% higher accuracy, a 1.53% higher recall, a 1.62% higher specificity, a 1.62% higher precision, and a 3.16% higher value of MCC compared to the direct original sequence-based encoding. Compared with the latest prediction work in the ZEA22133 dataset, RPIPCM is shown to be more effective with the accuracy increased by 2.23%, the recall increased by 1.78%, the specificity increased by 2.67%, the precision increased by 2.52%, and the value of MCC increased by 4.43%, which also proves the effectiveness and robustness of RPIPCM. In conclusion, RPIPCM of deep network model based on sequence feature encoding can automatically mine the hidden feature information of the sequence in the lncRNA-protein interaction without relying on external features or prior biomedical knowledge, and its low cost and high efficiency can provide a reference for biomedical researchers.

摘要

长链非编码RNA-蛋白质相互作用在生物过程中发挥着重要的调控作用。本文提出的基于新型深度网络模型的RPIPCM利用RNA和蛋白质的序列特征编码来预测长链非编码RNA-蛋白质相互作用(LPI)。针对正负样本不平衡问题,提出了一种滑动窗口负采样方法。通过对比实验表明,所提出的负采样方法是有效的,有助于解决现有LPI研究中的数据不平衡问题。实验结果还表明,所提出的序列特征编码方法在预测不同大小和类型的不同数据集的LPI方面具有良好的性能。在与动物相关的RPI488数据集中,与直接原始序列编码模型相比,序列特征编码模型的准确率提高了1.02%,召回率提高了4.08%,MCC值提高了1.67%。在植物数据集ATH948的情况下,与基于直接原始序列的编码相比,基于序列特征的编码在准确率上提高了1.58%,召回率提高了1.53%,特异性提高了1.62%,精确率提高了1.62%,MCC值提高了3.16%。与ZEA22133数据集中的最新预测工作相比,RPIPCM显示出更高的有效性,准确率提高了2.23%,召回率提高了1.78%,特异性提高了2.67%,精确率提高了2.52%,MCC值提高了4.43%,这也证明了RPIPCM的有效性和鲁棒性。总之,基于序列特征编码的深度网络模型RPIPCM能够自动挖掘长链非编码RNA-蛋白质相互作用中序列的隐藏特征信息,而无需依赖外部特征或先验生物医学知识,其低成本和高效率可为生物医学研究人员提供参考。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验