Li Yang, Keqi Wang, Wang Guohua
College of Information and Computer Engineering, Northeast Forestry University, Harbin 150004, China.
Bioinformatics. 2021 Oct 25;37(20):3579-3587. doi: 10.1093/bioinformatics/btab252.
Quantifying the associations between diseases is of great significance in increasing our understanding of disease biology, improving disease diagnosis, re-positioning and developing drugs. Therefore, in recent years, the research of disease similarity has received a lot of attention in the field of bioinformatics. Previous work has shown that the combination of the ontology (such as disease ontology and gene ontology) and disease-gene interactions are worthy to be regarded to elucidate diseases and disease associations. However, most of them are either based on the overlap between disease-related gene sets or distance within the ontology's hierarchy. The diseases in these methods are represented by discrete or sparse feature vectors, which cannot grasp the deep semantic information of diseases. Recently, deep representation learning has been widely studied and gradually applied to various fields of bioinformatics. Based on the hypothesis that disease representation depends on its related gene representations, we propose a disease representation model using two most representative gene resources HumanNet and Gene Ontology to construct a new gene network and learn gene (disease) representations. The similarity between two diseases is computed by the cosine similarity of their corresponding representations.
We propose a novel approach to compute disease similarity, which integrates two important factors disease-related genes and gene ontology hierarchy to learn disease representation based on deep representation learning. Under the same experimental settings, the AUC value of our method is 0.8074, which improves the most competitive baseline method by 10.1%. The quantitative and qualitative experimental results show that our model can learn effective disease representations and improve the accuracy of disease similarity computation significantly.
The research shows that this method has certain applicability in the prediction of gene-related diseases, the migration of disease treatment methods, drug development and so on.
Supplementary data are available at Bioinformatics online.
量化疾病之间的关联对于增进我们对疾病生物学的理解、改善疾病诊断、重新定位和开发药物具有重要意义。因此,近年来,疾病相似性研究在生物信息学领域受到了广泛关注。先前的工作表明,本体(如疾病本体和基因本体)与疾病-基因相互作用的结合值得被视为阐明疾病和疾病关联的方法。然而,它们大多要么基于疾病相关基因集之间的重叠,要么基于本体层次结构内的距离。这些方法中的疾病由离散或稀疏特征向量表示,无法把握疾病的深层语义信息。最近,深度表示学习得到了广泛研究,并逐渐应用于生物信息学的各个领域。基于疾病表示依赖于其相关基因表示的假设,我们提出了一种疾病表示模型,使用两个人类最具代表性的基因资源HumanNet和基因本体来构建一个新的基因网络,并学习基因(疾病)表示。两种疾病之间的相似性通过它们相应表示的余弦相似性来计算。
我们提出了一种计算疾病相似性的新方法,该方法整合了疾病相关基因和基因本体层次结构这两个重要因素,基于深度表示学习来学习疾病表示。在相同的实验设置下,我们方法的AUC值为0.8074,比最具竞争力的基线方法提高了10.1%。定量和定性实验结果表明,我们的模型可以学习有效的疾病表示,并显著提高疾病相似性计算的准确性。
研究表明,该方法在基因相关疾病的预测、疾病治疗方法的迁移、药物开发等方面具有一定的适用性。
补充数据可在《生物信息学》在线获取。