College of Artificial Intelligence, Nankai University, Tongyan Road, 300350, Tianjin, China.
College of Computer Science, Nankai University, Tongyan Road, 300350, Tianjin, China.
BMC Bioinformatics. 2021 Mar 21;22(1):136. doi: 10.1186/s12859-021-04073-z.
Numerous studies have demonstrated that long non-coding RNAs are related to plenty of human diseases. Therefore, it is crucial to predict potential lncRNA-disease associations for disease prognosis, diagnosis and therapy. Dozens of machine learning and deep learning algorithms have been adopted to this problem, yet it is still challenging to learn efficient low-dimensional representations from high-dimensional features of lncRNAs and diseases to predict unknown lncRNA-disease associations accurately.
We proposed an end-to-end model, VGAELDA, which integrates variational inference and graph autoencoders for lncRNA-disease associations prediction. VGAELDA contains two kinds of graph autoencoders. Variational graph autoencoders (VGAE) infer representations from features of lncRNAs and diseases respectively, while graph autoencoders propagate labels via known lncRNA-disease associations. These two kinds of autoencoders are trained alternately by adopting variational expectation maximization algorithm. The integration of both the VGAE for graph representation learning, and the alternate training via variational inference, strengthens the capability of VGAELDA to capture efficient low-dimensional representations from high-dimensional features, and hence promotes the robustness and preciseness for predicting unknown lncRNA-disease associations. Further analysis illuminates that the designed co-training framework of lncRNA and disease for VGAELDA solves a geometric matrix completion problem for capturing efficient low-dimensional representations via a deep learning approach.
Cross validations and numerical experiments illustrate that VGAELDA outperforms the current state-of-the-art methods in lncRNA-disease association prediction. Case studies indicate that VGAELDA is capable of detecting potential lncRNA-disease associations. The source code and data are available at https://github.com/zhanglabNKU/VGAELDA .
大量研究表明,长非编码 RNA 与许多人类疾病有关。因此,预测潜在的 lncRNA-疾病关联对于疾病的预后、诊断和治疗至关重要。已经有数十种机器学习和深度学习算法被应用于这个问题,但从 lncRNA 和疾病的高维特征中学习有效的低维表示来准确预测未知的 lncRNA-疾病关联仍然具有挑战性。
我们提出了一个端到端的模型,VGAELDA,它将变分推理和图自动编码器集成在一起用于 lncRNA-疾病关联预测。VGAELDA 包含两种图自动编码器。变分图自动编码器(VGAE)分别从 lncRNA 和疾病的特征中推断表示,而图自动编码器通过已知的 lncRNA-疾病关联传播标签。这两种自动编码器通过采用变分期望最大化算法交替训练。VGAE 用于图表示学习的集成,以及通过变分推理的交替训练,增强了 VGAELDA 从高维特征中捕获有效低维表示的能力,从而提高了预测未知 lncRNA-疾病关联的稳健性和准确性。进一步的分析表明,VGAELDA 为 lncRNA 和疾病设计的协同训练框架通过深度学习方法解决了一个有效的低维表示的几何矩阵完成问题。
交叉验证和数值实验表明,VGAELDA 在 lncRNA-疾病关联预测方面优于当前的最先进方法。案例研究表明,VGAELDA 能够检测潜在的 lncRNA-疾病关联。源代码和数据可在 https://github.com/zhanglabNKU/VGAELDA 上获得。