Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.
School of Artificial Intelligence, Jilin University, Changchun 130012, China.
Molecules. 2020 Sep 23;25(19):4372. doi: 10.3390/molecules25194372.
Long non-coding RNA (LncRNA) and microRNA (miRNA) are both non-coding RNAs that play significant regulatory roles in many life processes. There is cumulating evidence showing that the interaction patterns between lncRNAs and miRNAs are highly related to cancer development, gene regulation, cellular metabolic process, etc. Contemporaneously, with the rapid development of RNA sequence technology, numerous novel lncRNAs and miRNAs have been found, which might help to explore novel regulated patterns. However, the increasing unknown interactions between lncRNAs and miRNAs may hinder finding the novel regulated pattern, and wet experiments to identify the potential interaction are costly and time-consuming. Furthermore, few computational tools are available for predicting lncRNA-miRNA interaction based on a sequential level. In this paper, we propose a hybrid sequence feature-based model, LncMirNet (lncRNA-miRNA interactions network), to predict lncRNA-miRNA interactions via deep convolutional neural networks (CNN). First, four categories of sequence-based features are introduced to encode lncRNA/miRNA sequences including k-mer (k = 1, 2, 3, 4), composition transition distribution (CTD), doc2vec, and graph embedding features. Then, to fit the CNN learning pattern, a histogram-dd method is incorporated to fuse multiple types of features into a matrix. Finally, LncMirNet attained excellent performance in comparison with six other state-of-the-art methods on a real dataset collected from lncRNASNP2 via five-fold cross validation. LncMirNet increased accuracy and area under curve (AUC) by more than 3%, respectively, over that of the other tools, and improved the Matthews correlation coefficient (MCC) by more than 6%. These results show that LncMirNet can obtain high confidence in predicting potential interactions between lncRNAs and miRNAs.
长链非编码 RNA(lncRNA)和 microRNA(miRNA)都是非编码 RNA,它们在许多生命过程中发挥着重要的调节作用。有越来越多的证据表明,lncRNA 和 miRNA 之间的相互作用模式与癌症发展、基因调控、细胞代谢过程等密切相关。同时,随着 RNA 序列技术的快速发展,已经发现了许多新的 lncRNA 和 miRNA,这可能有助于探索新的调控模式。然而,lncRNA 和 miRNA 之间不断增加的未知相互作用可能会阻碍发现新的调控模式,而且识别潜在相互作用的湿实验既昂贵又耗时。此外,基于序列水平预测 lncRNA-miRNA 相互作用的计算工具很少。在本文中,我们提出了一种混合序列特征的基于模型的方法,LncMirNet(lncRNA-miRNA 相互作用网络),通过深度卷积神经网络(CNN)来预测 lncRNA-miRNA 相互作用。首先,引入了四类基于序列的特征来编码 lncRNA/miRNA 序列,包括 k-mer(k=1、2、3、4)、组成转换分布(CTD)、doc2vec 和图嵌入特征。然后,为了适应 CNN 的学习模式,采用直方图-dd 方法将多种类型的特征融合到一个矩阵中。最后,通过五重交叉验证,LncMirNet 在真实数据集上的性能优于其他六种最先进的方法。LncMirNet 在准确性和曲线下面积(AUC)方面的表现分别比其他工具提高了 3%以上,马修斯相关系数(MCC)提高了 6%以上。这些结果表明,LncMirNet 可以在预测 lncRNA 和 miRNA 之间潜在相互作用方面获得较高的置信度。