Suppr超能文献

LPI-SKMSC:基于分段 k--mer 频率和多空间聚类的长链非编码 RNA-蛋白质相互作用预测。

LPI-SKMSC: Predicting LncRNA-Protein Interactions with Segmented k-mer Frequencies and Multi-space Clustering.

机构信息

School of Electrical Engineering and Automation, Anhui University, Hefei, 230601, China.

School of Computer Science and Technology, Anhui University, Hefei, 230601, China.

出版信息

Interdiscip Sci. 2024 Jun;16(2):378-391. doi: 10.1007/s12539-023-00598-4. Epub 2024 Jan 11.

Abstract

Long noncoding RNAs (lncRNAs) have significant regulatory roles in gene expression. Interactions with proteins are one of the ways lncRNAs play their roles. Since experiments to determine lncRNA-protein interactions (LPIs) are expensive and time-consuming, many computational methods for predicting LPIs have been proposed as alternatives. In the LPIs prediction problem, there commonly exists the imbalance in the distribution of positive and negative samples. However, there are few existing methods that give specific consideration to this problem. In this paper, we proposed a new clustering-based LPIs prediction method using segmented k-mer frequencies and multi-space clustering (LPI-SKMSC). It was dedicated to handling the imbalance of positive and negative samples. We constructed segmented k-mer frequencies to obtain global and local features of lncRNA and protein sequences. Then, the multi-space clustering was applied to LPI-SKMSC. The convolutional neural network (CNN)-based encoders were used to map different features of a sample to different spaces. It used multiple spaces to jointly constrain the classification of samples. Finally, the distances between the output features of the encoder and the cluster center in each space were calculated. The sum of distances in all spaces was compared with the cluster radius to predict the LPIs. We performed cross-validation on 3 public datasets and LPI-SKMSC showed the best performance compared to other existing methods. Experimental results showed that LPI-SKMSC could predict LPIs more effectively when faced with imbalanced positive and negative samples. In addition, we illustrated that our model was better at uncovering potential lncRNA-protein interaction pairs.

摘要

长链非编码 RNA(lncRNA)在基因表达中具有重要的调节作用。与蛋白质的相互作用是 lncRNA 发挥作用的方式之一。由于确定 lncRNA-蛋白质相互作用(LPI)的实验既昂贵又耗时,因此已经提出了许多用于预测 LPI 的计算方法作为替代方法。在 LPI 预测问题中,阳性和阴性样本的分布通常存在不平衡。然而,现有的很少有方法专门考虑到这个问题。在本文中,我们提出了一种新的基于聚类的 LPI 预测方法,使用分段 k-mer 频率和多空间聚类(LPI-SKMSC)。它专门用于处理正负样本的不平衡问题。我们构建了分段 k-mer 频率,以获得 lncRNA 和蛋白质序列的全局和局部特征。然后,将多空间聚类应用于 LPI-SKMSC。基于卷积神经网络(CNN)的编码器用于将样本的不同特征映射到不同的空间。它使用多个空间来共同约束样本的分类。最后,计算编码器输出特征与每个空间中聚类中心之间的距离。所有空间中的距离总和与聚类半径进行比较,以预测 LPI。我们在 3 个公共数据集上进行了交叉验证,与其他现有方法相比,LPI-SKMSC 表现出了最佳的性能。实验结果表明,当面对不平衡的正负样本时,LPI-SKMSC 可以更有效地预测 LPI。此外,我们还说明了我们的模型更善于发现潜在的 lncRNA-蛋白质相互作用对。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验