Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China.
School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China.
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab028.
Emerging research shows that circular RNA (circRNA) plays a crucial role in the diagnosis, occurrence and prognosis of complex human diseases. Compared with traditional biological experiments, the computational method of fusing multi-source biological data to identify the association between circRNA and disease can effectively reduce cost and save time. Considering the limitations of existing computational models, we propose a semi-supervised generative adversarial network (GAN) model SGANRDA for predicting circRNA-disease association. This model first fused the natural language features of the circRNA sequence and the features of disease semantics, circRNA and disease Gaussian interaction profile kernel, and then used all circRNA-disease pairs to pre-train the GAN network, and fine-tune the network parameters through labeled samples. Finally, the extreme learning machine classifier is employed to obtain the prediction result. Compared with the previous supervision model, SGANRDA innovatively introduced circRNA sequences and utilized all the information of circRNA-disease pairs during the pre-training process. This step can increase the information content of the feature to some extent and reduce the impact of too few known associations on the model performance. SGANRDA obtained AUC scores of 0.9411 and 0.9223 in leave-one-out cross-validation and 5-fold cross-validation, respectively. Prediction results on the benchmark dataset show that SGANRDA outperforms other existing models. In addition, 25 of the top 30 circRNA-disease pairs with the highest scores of SGANRDA in case studies were verified by recent literature. These experimental results demonstrate that SGANRDA is a useful model to predict the circRNA-disease association and can provide reliable candidates for biological experiments.
新兴研究表明,环状 RNA(circRNA)在复杂人类疾病的诊断、发生和预后中起着至关重要的作用。与传统的生物学实验相比,融合多源生物数据的计算方法来识别 circRNA 与疾病之间的关联可以有效地降低成本和节省时间。考虑到现有计算模型的局限性,我们提出了一种半监督生成对抗网络(GAN)模型 SGANRDA 来预测 circRNA-疾病关联。该模型首先融合了 circRNA 序列的自然语言特征和疾病语义、circRNA 和疾病高斯相互作用核的特征,然后使用所有 circRNA-疾病对来预训练 GAN 网络,并通过标记样本微调网络参数。最后,使用极限学习机分类器获得预测结果。与之前的监督模型相比,SGANRDA 创新性地引入了 circRNA 序列,并在预训练过程中利用了 circRNA-疾病对的所有信息。这一步可以在一定程度上增加特征的信息量,并减少已知关联太少对模型性能的影响。SGANRDA 在留一交叉验证和 5 折交叉验证中分别获得了 0.9411 和 0.9223 的 AUC 得分。在基准数据集上的预测结果表明,SGANRDA 优于其他现有模型。此外,在案例研究中,SGANRDA 得分最高的前 30 个 circRNA-疾病对中有 25 对被最近的文献验证。这些实验结果表明,SGANRDA 是一种预测 circRNA-疾病关联的有用模型,可以为生物实验提供可靠的候选对象。