Gualdi Francesco, Oliva Baldomero, Piñero Janet
Integrative Biomedical Informatics, Research Programme on Biomedical Informatics (IBI-GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain.
Structural Bioinformatics Lab, Research Programme on Biomedical Informatics (SBI-GRIB), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, C/Dr Aiguader 88, E-08003 Barcelona, Spain.
NAR Genom Bioinform. 2024 May 14;6(2):lqae049. doi: 10.1093/nargab/lqae049. eCollection 2024 Jun.
Knowledge graph embeddings (KGE) are a powerful technique used in the biomedical domain to represent biological knowledge in a low dimensional space. However, a deep understanding of these methods is still missing, and, in particular, regarding their applications to prioritize genes associated with complex diseases with reduced genetic information. In this contribution, we built a knowledge graph (KG) by integrating heterogeneous biomedical data and generated KGE by implementing state-of-the-art methods, and two novel algorithms: Dlemb and BioKG2vec. Extensive testing of the embeddings with unsupervised clustering and supervised methods showed that KGE can be successfully implemented to predict genes associated with diseases and that our novel approaches outperform most existing algorithms in both scenarios. Our findings underscore the significance of data quality, preprocessing, and integration in achieving accurate predictions. Additionally, we applied KGE to predict genes linked to Intervertebral Disc Degeneration (IDD) and illustrated that functions pertinent to the disease are enriched within the prioritized gene set.
知识图谱嵌入(KGE)是生物医学领域中用于在低维空间表示生物知识的一种强大技术。然而,对这些方法仍缺乏深入理解,尤其是在利用减少的遗传信息对与复杂疾病相关的基因进行优先级排序的应用方面。在本研究中,我们通过整合异构生物医学数据构建了一个知识图谱(KG),并通过实施最先进的方法以及两种新算法:Dlemb和BioKG2vec生成了KGE。使用无监督聚类和监督方法对嵌入进行的广泛测试表明,KGE可以成功用于预测与疾病相关的基因,并且我们的新方法在这两种情况下均优于大多数现有算法。我们的研究结果强调了数据质量、预处理和整合在实现准确预测中的重要性。此外,我们应用KGE预测与椎间盘退变(IDD)相关的基因,并表明在优先级基因集中富集了与该疾病相关的功能。