Lu Jiaying, Shen Jiaming, Xiong Bo, Ma Wenjing, Staab Steffen, Yang Carl
Emory University, USA.
Google Research, USA.
Int ACM SIGIR Conf Res Dev Inf Retr. 2023 Jul;2023:2052-2056. doi: 10.1145/3539618.3591997. Epub 2023 Jul 18.
Medical decision-making processes can be enhanced by comprehensive biomedical knowledge bases, which require fusing knowledge graphs constructed from different sources via a uniform index system. The index system often organizes biomedical terms in a hierarchy to provide the aligned entities with fine-grained granularity. To address the challenge of scarce supervision in the biomedical knowledge fusion (BKF) task, researchers have proposed various unsupervised methods. However, these methods heavily rely on ad-hoc lexical and structural matching algorithms, which fail to capture the rich semantics conveyed by biomedical entities and terms. Recently, neural embedding models have proved effective in semantic-rich tasks, but they rely on sufficient labeled data to be adequately trained. To bridge the gap between the scarce-labeled BKF and neural embedding models, we propose HiPrompt, a supervision-efficient knowledge fusion framework that elicits the few-shot reasoning ability of large language models through hierarchy-oriented prompts. Empirical results on the collected KG-Hi-BKF benchmark datasets demonstrate the effectiveness of HiPrompt.
全面的生物医学知识库可以增强医学决策过程,这需要通过统一的索引系统融合从不同来源构建的知识图谱。索引系统通常会以层次结构组织生物医学术语,以便为对齐的实体提供细粒度的粒度。为了应对生物医学知识融合(BKF)任务中监督稀缺的挑战,研究人员提出了各种无监督方法。然而,这些方法严重依赖临时的词汇和结构匹配算法,无法捕捉生物医学实体和术语所传达的丰富语义。最近,神经嵌入模型已被证明在语义丰富的任务中有效,但它们依赖于足够的标记数据才能得到充分训练。为了弥合标记稀缺的BKF与神经嵌入模型之间的差距,我们提出了HiPrompt,这是一个监督高效的知识融合框架,通过面向层次结构的提示来激发大语言模型的少样本推理能力。在收集的KG-Hi-BKF基准数据集上的实证结果证明了HiPrompt的有效性。