Mohamed bin Zayed University of Artificial Intelligence, Masdar City, UAE.
Department of Gastroenterology, Tangdu Hospital, Air Force Medical University, 569 Xinsi Road, Xi'an 710038, Shaanxi, China.
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad317.
MicroRNAs (miRNAs) silence genes by binding to messenger RNAs, whereas long non-coding RNAs (lncRNAs) act as competitive endogenous RNAs (ceRNAs) that can relieve miRNA silencing effects and upregulate target gene expression. The ceRNA association between lncRNAs and miRNAs has been a research hotspot due to its medical importance, but it is challenging to verify experimentally. In this paper, we propose a novel deep learning scheme, i.e. sequence pre-training-based graph neural network (SPGNN), that combines pre-training and fine-tuning stages to predict lncRNA-miRNA associations from RNA sequences and the existing interactions represented as a graph. First, we utilize a sequence-to-vector technique to generate pre-trained embeddings based on the sequences of all RNAs during the pre-training stage. In the fine-tuning stage, we use Graph Neural Network to learn node representations from the heterogeneous graph constructed using lncRNA-miRNA association information. We evaluate our proposed scheme SPGNN on our newly collected animal lncRNA-miRNA association dataset and demonstrate that combining the $k$-mer technique and Doc2vec model for pre-training with the Simple Graph Convolution Network for fine-tuning is effective in predicting lncRNA-miRNA associations. Our approach outperforms state-of-the-art baselines across various evaluation metrics. We also conduct an ablation study and hyperparameter analysis to verify the effectiveness of each component and parameter of our scheme. The complete code and dataset are available on GitHub: https://github.com/zixwang/SPGNN.
微小 RNA(miRNAs)通过与信使 RNA 结合来沉默基因,而长非编码 RNA(lncRNAs)则作为竞争性内源性 RNA(ceRNA)发挥作用,可以解除 miRNA 的沉默作用并上调靶基因的表达。lncRNA 和 miRNAs 之间的 ceRNA 关联因其医学重要性而成为研究热点,但实验验证具有挑战性。在本文中,我们提出了一种新的深度学习方案,即基于序列预训练的图神经网络(SPGNN),它结合了预训练和微调阶段,从 RNA 序列和表示为图的现有相互作用中预测 lncRNA-miRNA 关联。首先,我们利用序列到向量技术在预训练阶段基于所有 RNA 的序列生成预训练嵌入。在微调阶段,我们使用图神经网络从使用 lncRNA-miRNA 关联信息构建的异构图中学习节点表示。我们在我们新收集的动物 lncRNA-miRNA 关联数据集上评估了我们提出的 SPGNN 方案,并证明了将 $k$-mer 技术和 Doc2vec 模型结合用于预训练,以及将 Simple Graph Convolution Network 用于微调,在预测 lncRNA-miRNA 关联方面是有效的。我们的方法在各种评估指标上均优于最先进的基线。我们还进行了消融研究和超参数分析,以验证我们方案的每个组件和参数的有效性。完整的代码和数据集可在 GitHub 上获得:https://github.com/zixwang/SPGNN。