Lu Chengqian, Zeng Min, Wu Fang-Xiang, Li Min, Wang Jianxin
School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China.
Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha 410083, P.R. China.
Bioinformatics. 2021 Apr 5;36(24):5656-5664. doi: 10.1093/bioinformatics/btaa1077.
Emerging studies indicate that circular RNAs (circRNAs) are widely involved in the progression of human diseases. Due to its special structure which is stable, circRNAs are promising diagnostic and prognostic biomarkers for diseases. However, the experimental verification of circRNA-disease associations is expensive and limited to small-scale. Effective computational methods for predicting potential circRNA-disease associations are regarded as a matter of urgency. Although several models have been proposed, over-reliance on known associations and the absence of characteristics of biological functions make precise predictions are still challenging.
In this study, we propose a method for predicting CircRNA-disease associations based on sequence and ontology representations, named CDASOR, with convolutional and recurrent neural networks. For sequences of circRNAs, we encode them with continuous k-mers, get low-dimensional vectors of k-mers, extract their local feature vectors with 1D CNN and learn their long-term dependencies with bi-directional long short-term memory. For diseases, we serialize disease ontology into sentences containing the hierarchy of ontology, obtain low-dimensional vectors for disease ontology terms and get terms' dependencies. Furthermore, we get association patterns of circRNAs and diseases from known circRNA-disease associations with neural networks. After the above steps, we get circRNAs' and diseases' high-level representations, which are informative to improve the prediction. The experimental results show that CDASOR provides an accurate prediction. Importing the characteristics of biological functions, CDASOR achieves impressive predictions in the de novo test. In addition, 6 of the top-10 predicted results are verified by the published literature in the case studies.
The code and data of CDASOR are freely available at https://github.com/BioinformaticsCSU/CDASOR.
新兴研究表明,环状RNA(circRNA)广泛参与人类疾病的进展。由于其特殊的稳定结构,circRNA有望成为疾病的诊断和预后生物标志物。然而,circRNA与疾病关联的实验验证成本高昂且仅限于小规模研究。开发有效的计算方法来预测潜在的circRNA与疾病的关联迫在眉睫。尽管已经提出了几种模型,但过度依赖已知关联以及缺乏生物学功能特征使得精确预测仍然具有挑战性。
在本研究中,我们提出了一种基于序列和本体表示的预测circRNA与疾病关联的方法,名为CDASOR,该方法使用了卷积神经网络和循环神经网络。对于circRNA序列,我们用连续的k-mer对其进行编码,得到k-mer的低维向量,用一维卷积神经网络提取其局部特征向量,并使用双向长短期记忆学习其长期依赖性。对于疾病,我们将疾病本体序列化为包含本体层次结构的句子,获得疾病本体术语的低维向量并得到术语之间的依赖性。此外,我们通过神经网络从已知的circRNA与疾病关联中获取circRNA和疾病的关联模式。经过上述步骤,我们得到了circRNA和疾病的高级表示,这有助于提高预测的准确性。实验结果表明,CDASOR提供了准确的预测。引入生物学功能特征后,CDASOR在从头测试中取得了令人印象深刻的预测结果。此外,在案例研究中,前10个预测结果中有6个得到了已发表文献的验证。
CDASOR的代码和数据可在https://github.com/BioinformaticsCSU/CDASOR上免费获取。