• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估超参数对知识图谱嵌入质量的影响。

Assessing the effects of hyperparameters on knowledge graph embedding quality.

作者信息

Lloyd Oliver, Liu Yi, R Gaunt Tom

机构信息

MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK.

出版信息

J Big Data. 2023;10(1):59. doi: 10.1186/s40537-023-00732-5. Epub 2023 May 6.

DOI:10.1186/s40537-023-00732-5
PMID:37168524
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10164002/
Abstract

UNLABELLED

Embedding knowledge graphs into low-dimensional spaces is a popular method for applying approaches, such as link prediction or node classification, to these databases. This embedding process is very costly in terms of both computational time and space. Part of the reason for this is the optimisation of hyperparameters, which involves repeatedly sampling, by random, guided, or brute-force selection, from a large hyperparameter space and testing the resulting embeddings for their quality. However, not all hyperparameters in this search space will be equally important. In fact, with prior knowledge of the relative importance of the hyperparameters, some could be eliminated from the search altogether without significantly impacting the overall quality of the outputted embeddings. To this end, we ran a Sobol sensitivity analysis to evaluate the effects of tuning different hyperparameters on the variance of embedding quality. This was achieved by performing thousands of embedding trials, each time measuring the quality of embeddings produced by different hyperparameter configurations. We regressed the embedding quality on those hyperparameter configurations, using this model to generate Sobol sensitivity indices for each of the hyperparameters. By evaluating the correlation between Sobol indices, we find substantial variability in the hyperparameter sensitivities between knowledge graphs with differing dataset characteristics as the probable cause of these inconsistencies. As an additional contribution of this work we identify several relations in the UMLS knowledge graph that may cause data leakage via inverse relations, and derive and present UMLS-43, a leakage-robust variant of that graph.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1186/s40537-023-00732-5.

摘要

未标注

将知识图谱嵌入低维空间是一种将链接预测或节点分类等方法应用于这些数据库的常用方法。这种嵌入过程在计算时间和空间方面都非常昂贵。部分原因在于超参数的优化,这涉及从一个大的超参数空间中通过随机、引导或暴力选择进行反复采样,并测试所得嵌入的质量。然而,并非这个搜索空间中的所有超参数都同等重要。事实上,有了超参数相对重要性的先验知识,一些超参数可以完全从搜索中排除,而不会显著影响输出嵌入的整体质量。为此,我们进行了索伯尔敏感性分析,以评估调整不同超参数对嵌入质量方差的影响。这是通过进行数千次嵌入试验来实现的,每次测量不同超参数配置产生的嵌入质量。我们将嵌入质量对那些超参数配置进行回归,使用这个模型为每个超参数生成索伯尔敏感性指数。通过评估索伯尔指数之间的相关性,我们发现具有不同数据集特征的知识图谱之间超参数敏感性存在很大差异,这可能是这些不一致的原因。作为这项工作的额外贡献,我们在UMLS知识图谱中识别出一些可能通过逆关系导致数据泄露的关系,并推导并展示了UMLS - 43,即该图谱的一个抗泄露变体。

补充信息

在线版本包含可在10.1186/s40537 - 023 - 00732 - 5获取的补充材料。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cb4/10164002/1edee256b0ed/40537_2023_732_Fig3a_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cb4/10164002/99b459aa2e29/40537_2023_732_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cb4/10164002/70de9f0b2887/40537_2023_732_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cb4/10164002/1edee256b0ed/40537_2023_732_Fig3a_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cb4/10164002/99b459aa2e29/40537_2023_732_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cb4/10164002/70de9f0b2887/40537_2023_732_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3cb4/10164002/1edee256b0ed/40537_2023_732_Fig3a_HTML.jpg

相似文献

1
Assessing the effects of hyperparameters on knowledge graph embedding quality.评估超参数对知识图谱嵌入质量的影响。
J Big Data. 2023;10(1):59. doi: 10.1186/s40537-023-00732-5. Epub 2023 May 6.
2
Use of word and graph embedding to measure semantic relatedness between Unified Medical Language System concepts.使用词和图嵌入来衡量统一医学语言系统概念之间的语义相关性。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1538-1546. doi: 10.1093/jamia/ocaa136.
3
Survey on graph embeddings and their applications to machine learning problems on graphs.关于图嵌入及其在图上机器学习问题中的应用的综述。
PeerJ Comput Sci. 2021 Feb 4;7:e357. doi: 10.7717/peerj-cs.357. eCollection 2021.
4
Ant Colony-Based Hyperparameter Optimisation in Total Variation Reconstruction in X-ray Computed Tomography.基于蚁群算法的 X 射线计算机断层扫描全变差重建中的超参数优化。
Sensors (Basel). 2021 Jan 15;21(2):591. doi: 10.3390/s21020591.
5
FuseLinker: Leveraging LLM's pre-trained text embeddings and domain knowledge to enhance GNN-based link prediction on biomedical knowledge graphs.FuseLinker:利用大语言模型的预训练文本嵌入和领域知识增强基于图神经网络的生物医学知识图谱的链接预测。
J Biomed Inform. 2024 Oct;158:104730. doi: 10.1016/j.jbi.2024.104730. Epub 2024 Sep 24.
6
Application and evaluation of knowledge graph embeddings in biomedical data.知识图谱嵌入技术在生物医学数据中的应用与评估
PeerJ Comput Sci. 2021 Feb 18;7:e341. doi: 10.7717/peerj-cs.341. eCollection 2021.
7
Edge-Centric Embeddings of Digraphs: Properties and Stability Under Sparsification.有向图的以边为中心的嵌入:稀疏化下的性质与稳定性
Entropy (Basel). 2025 Mar 14;27(3):304. doi: 10.3390/e27030304.
8
Community detection in networks using graph embeddings.使用图嵌入技术在网络中进行社区检测。
Phys Rev E. 2021 Feb;103(2-1):022316. doi: 10.1103/PhysRevE.103.022316.
9
Improving classification accuracy of fine-tuned CNN models: Impact of hyperparameter optimization.提高微调卷积神经网络(CNN)模型的分类准确率:超参数优化的影响。
Heliyon. 2024 Feb 23;10(5):e26586. doi: 10.1016/j.heliyon.2024.e26586. eCollection 2024 Mar 15.
10
LogicENN: A Neural Based Knowledge Graphs Embedding Model With Logical Rules.LogicENN:一种基于神经网络的带有逻辑规则的知识图嵌入模型。
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7050-7062. doi: 10.1109/TPAMI.2021.3121646. Epub 2023 May 5.

引用本文的文献

1
Fine-tuning LLM hyperparameters to align semantic and physiological contexts of aging-related pathways.微调大语言模型超参数以匹配衰老相关通路的语义和生理背景。
Mol Divers. 2025 Jun 6. doi: 10.1007/s11030-025-11226-2.
2
Triangulating evidence in health sciences with Annotated Semantic Queries.健康科学中使用带注释语义查询的三角证据。
Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae519.

本文引用的文献

1
Sobol Sensitivity Analysis: A Tool to Guide the Development and Evaluation of Systems Pharmacology Models.Sobol 敏感性分析:指导系统药理学模型开发和评估的工具。
CPT Pharmacometrics Syst Pharmacol. 2015 Feb;4(2):69-79. doi: 10.1002/psp4.6.
2
The Unified Medical Language System (UMLS): integrating biomedical terminology.统一医学语言系统(UMLS):整合生物医学术语。
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70. doi: 10.1093/nar/gkh061.