基于图自动编码器矩阵补全推断长链非编码RNA与疾病的关联

Inferring LncRNA-disease associations based on graph autoencoder matrix completion.

作者信息

Wu Ximin, Lan Wei, Chen Qingfeng, Dong Yi, Liu Jin, Peng Wei

机构信息

School of Computer, Electronic and Information, Guangxi University, Nanning, China.

School of Computer, Electronic and Information, Guangxi University, Nanning, China; Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China.

出版信息

Comput Biol Chem. 2020 May 20;87:107282. doi: 10.1016/j.compbiolchem.2020.107282.

DOI:10.1016/j.compbiolchem.2020.107282

PMID:32502934

Abstract

Accumulating studies have indicated that long non-coding RNAs (lncRNAs) play crucial roles in large amount of biological processes. Predicting lncRNA-disease associations can help biologist to understand the molecular mechanism of human disease and benefit for disease diagnosis, treatment and prevention. In this paper, we introduce a computational framework based on graph autoencoder matrix completion (GAMCLDA) to identify lncRNA-disease associations. In our method, the graph convolutional network is utilized to encode local graph structure and features of nodes for learning latent factor vectors of lncRNA and disease. Further, the inner product of lncRNA factor vector and disease factor vector is used as decoder to reconstruct the lncRNA-disease association matrix. In addition, the cost-sensitive neural network is utilized to deal with the imbalance between positive and negative samples. The experimental results show GAMLDA outperforms other state-of-the-art methods in prediction performance which is evaluated by AUC value, AUPR value, PPV and F1-score. Moreover, the case study shows our method is the effectively tool for potential lncRNA-disease prediction.

摘要

越来越多的研究表明，长链非编码RNA（lncRNA）在大量生物学过程中发挥着关键作用。预测lncRNA与疾病的关联有助于生物学家了解人类疾病的分子机制，并有利于疾病的诊断、治疗和预防。在本文中，我们介绍了一种基于图自动编码器矩阵补全（GAMCLDA）的计算框架，用于识别lncRNA与疾病的关联。在我们的方法中，图卷积网络用于编码局部图结构和节点特征，以学习lncRNA和疾病的潜在因子向量。此外，lncRNA因子向量和疾病因子向量的内积用作解码器，以重建lncRNA与疾病的关联矩阵。另外，利用成本敏感神经网络来处理正负样本之间的不平衡。实验结果表明，GAMLDA在预测性能上优于其他现有方法，预测性能通过AUC值、AUPR值、PPV和F1分数进行评估。此外，案例研究表明我们的方法是潜在lncRNA与疾病预测的有效工具。