照亮黑暗：统一框架下知识图谱嵌入模型的大规模评估

Bringing Light Into the Dark: A Large-Scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework.

作者信息

Ali Mehdi, Berrendorf Max, Hoyt Charles Tapley, Vermue Laurent, Galkin Mikhail, Sharifzadeh Sahand, Fischer Asja, Tresp Volker, Lehmann Jens

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):8825-8845. doi: 10.1109/TPAMI.2021.3124805. Epub 2022 Nov 7.

DOI:10.1109/TPAMI.2021.3124805

PMID:34735335

Abstract

The heterogeneity in recently published knowledge graph embedding models' implementations, training, and evaluation has made fair and thorough comparisons difficult. To assess the reproducibility of previously published results, we re-implemented and evaluated 21 models in the PyKEEN software package. In this paper, we outline which results could be reproduced with their reported hyper-parameters, which could only be reproduced with alternate hyper-parameters, and which could not be reproduced at all, as well as provide insight as to why this might be the case. We then performed a large-scale benchmarking on four datasets with several thousands of experiments and 24,804 GPU hours of computation time. We present insights gained as to best practices, best configurations for each model, and where improvements could be made over previously published best configurations. Our results highlight that the combination of model architecture, training approach, loss function, and the explicit modeling of inverse relations is crucial for a model's performance and is not only determined by its architecture. We provide evidence that several architectures can obtain results competitive to the state of the art when configured carefully. We have made all code, experimental configurations, results, and analyses available at https://github.com/pykeen/pykeen and https://github.com/pykeen/benchmarking.

摘要

最近发布的知识图谱嵌入模型在实现、训练和评估方面的异质性使得进行公平且全面的比较变得困难。为了评估先前发表结果的可重复性，我们在PyKEEN软件包中重新实现并评估了21个模型。在本文中，我们概述了哪些结果可以使用其报告的超参数进行重现，哪些只能使用替代超参数进行重现，哪些根本无法重现，以及说明出现这种情况的原因。然后，我们在四个数据集上进行了大规模基准测试，进行了数千次实验，计算时间达24,804 GPU小时。我们展示了关于最佳实践、每个模型的最佳配置以及相对于先前发表的最佳配置可以在哪些方面进行改进的见解。我们的结果表明，模型架构、训练方法、损失函数以及逆关系的显式建模的组合对于模型性能至关重要，并且不仅仅由其架构决定。我们提供了证据表明，经过精心配置，几种架构可以获得与当前技术水平相竞争的结果。我们已将所有代码、实验配置、结果和分析发布在https://github.com/pykeen/pykeen和https://github.com/pykeen/benchmarking上。

相似文献

Bringing Light Into the Dark: A Large-Scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework.照亮黑暗：统一框架下知识图谱嵌入模型的大规模评估

IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):8825-8845. doi: 10.1109/TPAMI.2021.3124805. Epub 2022 Nov 7.

BioKEEN: a library for learning and evaluating biological knowledge graph embeddings.BioKEEN：用于学习和评估生物知识图嵌入的库。

Bioinformatics. 2019 Sep 15;35(18):3538-3540. doi: 10.1093/bioinformatics/btz117.

Multi-omics single-cell data integration and regulatory inference with graph-linked embedding.基于图链接嵌入的多组学单细胞数据整合与调控推断。

Nat Biotechnol. 2022 Oct;40(10):1458-1466. doi: 10.1038/s41587-022-01284-4. Epub 2022 May 2.

Obtaining psychological embeddings through joint kernel and metric learning.通过联合核和度量学习获取心理嵌入。

Behav Res Methods. 2019 Oct;51(5):2180-2193. doi: 10.3758/s13428-019-01285-3.

DSGAT: predicting frequencies of drug side effects by graph attention networks.DSGAT：通过图注意力网络预测药物副作用频率

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab586.

AlphaGAN: Fully Differentiable Architecture Search for Generative Adversarial Networks.AlphaGAN：用于生成对抗网络的全可微架构搜索

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):6752-6766. doi: 10.1109/TPAMI.2021.3099829. Epub 2022 Sep 14.

NeuroPycon: An open-source python toolbox for fast multi-modal and reproducible brain connectivity pipelines.NeuroPycon：一个开源的 Python 工具包，用于快速进行多模态和可重复的脑连接管道。

Neuroimage. 2020 Oct 1;219:117020. doi: 10.1016/j.neuroimage.2020.117020. Epub 2020 Jun 6.

MOABB: trustworthy algorithm benchmarking for BCIs.MOABB：用于脑机接口的可信算法基准测试。

J Neural Eng. 2018 Dec;15(6):066011. doi: 10.1088/1741-2552/aadea0. Epub 2018 Sep 4.

Integrating biological knowledge and gene expression data using pathway-guided random forests: a benchmarking study.使用通路引导的随机森林整合生物学知识和基因表达数据：一项基准研究

Bioinformatics. 2020 Aug 1;36(15):4301-4308. doi: 10.1093/bioinformatics/btaa483.

MVS-GCN: A prior brain structure learning-guided multi-view graph convolution network for autism spectrum disorder diagnosis.MVS-GCN：一种基于先验脑结构学习的多视图图卷积网络自闭症谱系障碍诊断方法。

Comput Biol Med. 2022 Mar;142:105239. doi: 10.1016/j.compbiomed.2022.105239. Epub 2022 Jan 19.

引用本文的文献

Morphological map of under- and overexpression of genes in human cells.人类细胞中基因表达不足和过度表达的形态学图谱。

Nat Methods. 2025 Aug;22(8):1742-1752. doi: 10.1038/s41592-025-02753-9. Epub 2025 Aug 7.

Bind: large-scale biological interaction network discovery through knowledge graph-driven machine learning.Bind：通过知识图谱驱动的机器学习发现大规模生物相互作用网络

J Transl Med. 2025 Jul 31;23(1):856. doi: 10.1186/s12967-025-06789-5.

Predicting Natural Product-Drug Interactions with Knowledge Graph Embeddings.利用知识图谱嵌入预测天然产物与药物的相互作用。

AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:556-565. eCollection 2025.

Predicting drug-gene relations via analogy tasks with word embeddings.通过词嵌入类比任务预测药物-基因关系。

Sci Rep. 2025 May 18;15(1):17240. doi: 10.1038/s41598-025-01418-z.

Joint embedding-classifier learning for interpretable collaborative filtering.用于可解释协同过滤的联合嵌入分类器学习

BMC Bioinformatics. 2025 Jan 22;26(1):26. doi: 10.1186/s12859-024-06026-8.

Credibility-based knowledge graph embedding for identifying social brand advocates.用于识别社交品牌倡导者的基于可信度的知识图谱嵌入

Front Big Data. 2024 Nov 20;7:1469819. doi: 10.3389/fdata.2024.1469819. eCollection 2024.

Fast polypharmacy side effect prediction using tensor factorization.使用张量分解进行快速的多药副作用预测。

Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae706.

DDAffinity: predicting the changes in binding affinity of multiple point mutations using protein 3D structure.DDAffinity：使用蛋白质 3D 结构预测多点突变对结合亲和力的影响。

Bioinformatics. 2024 Jun 28;40(Suppl 1):i418-i427. doi: 10.1093/bioinformatics/btae232.

Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning.通过神经符号学、知识增强学习对基因组变体进行优先级排序。

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae301.

BioBLP: a modular framework for learning on multimodal biomedical knowledge graphs.BioBLP：一种用于多模态生物医学知识图谱学习的模块化框架。

J Biomed Semantics. 2023 Dec 8;14(1):20. doi: 10.1186/s13326-023-00301-y.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

照亮黑暗：统一框架下知识图谱嵌入模型的大规模评估

Bringing Light Into the Dark: A Large-Scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework.

作者信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献