• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用大语言模型进行本体丰富:将基于词汇、语义和知识网络的相似性应用于概念放置。

Ontology enrichment using a large language model: Applying lexical, semantic, and knowledge network-based similarity for concept placement.

作者信息

Kollapally Navya Martin, Geller James, Keloth Vipina Kuttichi, He Zhe, Xu Julia

机构信息

Kean University, United States.

New Jersey Institute of Technology, United States.

出版信息

J Biomed Inform. 2025 Aug;168:104865. doi: 10.1016/j.jbi.2025.104865. Epub 2025 Jun 19.

DOI:10.1016/j.jbi.2025.104865
PMID:40543734
Abstract

OBJECTIVE

Ontologies are essential for representing the knowledge of a domain. To make ontologies useful, they must encompass a comprehensive domain view. To achieve ontology enrichment, there is a need to discover new concepts to be added, either because they were missed in the first place, or the state-of-the-art has advanced to develop new real-world concepts. Our goal is to develop an automatic enrichment pipeline using a seed ontology, a Large Language Model (LLM), and source of text. The pipeline is applied to the domain of Social Determinants of Health (SDoH), using PubMed as a source of concepts. In this work, the applicability and effectiveness of the enrichment pipeline is demonstrated by extending the SDoH Ontology called SOHOv1, however our methodology could be used in other domains as well.

METHODS

We first retrieved PubMed abstracts of candidate articles with existing SOHOv1 concepts as search terms. Next, we used GPT-4-1201 to extract semantic triples from the abstracts. We identified concepts from these triples utilizing lexical, semantic, and knowledge network-based filtering. We also compared the granularity of semantic triples extracted with our method to the triples in the SemMedDB (Semantic MEDLINE Database). The results were evaluated by human experts and standard ontology tools for checking consistency and semantic correctness.

RESULTS

We expanded SOHOv1, which contained 173 concepts and 585 axioms, including 207 logical axioms to SOHOv2, which contains 572 concepts, 1,542 axioms, including 725 logical axioms. Our methods identified more concepts than those extracted from SemMedDB for the same task. While we have shown the feasibility of our approach for an SDoH ontology, the methodology is generalizable to other ontologies with an existing seed ontology and text corpus.

CONCLUSIONS

The contributions of this work are: Extracting semantic triples from PubMed abstracts using GPT-4-1201 utilizing prompt chaining; showing the superiority of triples from GPT-4-1201 over triples from SemMedDB for SDoH; using lexical and semantic similarity search techniques with knowledge network-based search to identify the concepts to be added to the ontology; confirming the quality of the new concepts with human experts.

摘要

目的

本体对于表示一个领域的知识至关重要。为使本体有用,它们必须包含全面的领域视图。为实现本体丰富,有必要发现新的概念以添加进来,这要么是因为一开始就遗漏了这些概念,要么是因为当前技术水平已经发展到产生了新的现实世界概念。我们的目标是使用种子本体、大语言模型(LLM)和文本源开发一个自动丰富管道。该管道应用于健康的社会决定因素(SDoH)领域,使用PubMed作为概念源。在这项工作中,通过扩展名为SOHOv1的SDoH本体来证明丰富管道的适用性和有效性,然而我们的方法也可用于其他领域。

方法

我们首先以现有的SOHOv1概念作为搜索词检索候选文章的PubMed摘要。接下来,我们使用GPT - 4 - 1201从摘要中提取语义三元组。我们利用基于词汇、语义和知识网络的过滤从这些三元组中识别概念。我们还将用我们的方法提取的语义三元组的粒度与语义医学文献数据库(SemMedDB)中的三元组进行了比较。结果由人类专家和标准本体工具进行评估,以检查一致性和语义正确性。

结果

我们将包含173个概念和585个公理(包括207个逻辑公理)的SOHOv1扩展为包含572个概念、1542个公理(包括725个逻辑公理)的SOHOv2。对于相同任务,我们的方法识别出的概念比从SemMedDB中提取的更多。虽然我们已经展示了我们的方法对于SDoH本体的可行性,但该方法可推广到具有现有种子本体和文本语料库的其他本体。

结论

这项工作的贡献在于:利用提示链使用GPT - 4 - 1201从PubMed摘要中提取语义三元组;展示了对于SDoH,GPT - 4 - 1201的三元组优于SemMedDB的三元组;使用基于词汇和语义相似性搜索技术以及基于知识网络的搜索来识别要添加到本体中的概念;通过人类专家确认新概念的质量。

相似文献

1
Ontology enrichment using a large language model: Applying lexical, semantic, and knowledge network-based similarity for concept placement.使用大语言模型进行本体丰富:将基于词汇、语义和知识网络的相似性应用于概念放置。
J Biomed Inform. 2025 Aug;168:104865. doi: 10.1016/j.jbi.2025.104865. Epub 2025 Jun 19.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Unveiling differential adverse event profiles in vaccines via LLM text embeddings and ontology semantic analysis.通过大语言模型文本嵌入和本体语义分析揭示疫苗中不同的不良事件特征。
J Biomed Semantics. 2025 May 23;16(1):10. doi: 10.1186/s13326-025-00331-8.
4
Short-Term Memory Impairment短期记忆障碍
5
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
6
Improving Large Language Models' Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation.通过在出院小结中添加重点内容提高大语言模型的总结准确性:比较评估
JMIR Med Inform. 2025 Jul 24;13:e66476. doi: 10.2196/66476.
7
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
8
Antidepressants for pain management in adults with chronic pain: a network meta-analysis.抗抑郁药治疗成人慢性疼痛的疼痛管理:一项网络荟萃分析。
Health Technol Assess. 2024 Oct;28(62):1-155. doi: 10.3310/MKRT2948.
9
Diagnostic test accuracy and cost-effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma.染色体臂 1p 和 19q 缺失的检测在胶质瘤患者中的诊断准确性和成本效益。
Cochrane Database Syst Rev. 2022 Mar 2;3(3):CD013387. doi: 10.1002/14651858.CD013387.pub2.
10
Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。
Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.

本文引用的文献

1
Measuring social determinants of health in the All of Us Research Program.测量“全民研究计划”中的健康社会决定因素。
Sci Rep. 2024 Apr 16;14(1):8815. doi: 10.1038/s41598-024-57410-6.
2
ParTRE: A relational triple extraction model of complicated entities and imbalanced relations in Parkinson's disease.ParTRE:帕金森病中复杂实体和不平衡关系的关系三元组抽取模型。
J Biomed Inform. 2024 Apr;152:104624. doi: 10.1016/j.jbi.2024.104624. Epub 2024 Mar 11.
3
Contextual Word Embedding for Biomedical Knowledge Extraction: a Rapid Review and Case Study.
用于生物医学知识提取的上下文词嵌入:快速回顾与案例研究
J Healthc Inform Res. 2024 Jan 3;8(1):158-179. doi: 10.1007/s41666-023-00157-y. eCollection 2024 Mar.
4
Integrating Commercial and Social Determinants of Health: A Unified Ontology for Non-Clinical Determinants of Health.整合商业和健康的社会决定因素:非临床健康决定因素的统一本体论。
AMIA Annu Symp Proc. 2024 Jan 11;2023:446-455. eCollection 2023.
5
Systematic design and data-driven evaluation of social determinants of health ontology (SDoHO).系统设计和数据驱动的健康决定因素本体(SDoHO)评估。
J Am Med Inform Assoc. 2023 Aug 18;30(9):1465-1473. doi: 10.1093/jamia/ocad096.
6
An Investigation of the Representation of Social Determinants of Health in the UMLS.UMLS 中健康的社会决定因素表示法的研究。
AMIA Annu Symp Proc. 2023 Apr 29;2022:912-921. eCollection 2022.
7
A dataset for plain language adaptation of biomedical abstracts.生物医学文摘的自然语言适应数据集。
Sci Data. 2023 Jan 4;10(1):8. doi: 10.1038/s41597-022-01920-3.
8
A Socio-Ecological Approach to Addressing Digital Redlining in the United States: A Call to Action for Health Equity.一种解决美国数字鸿沟问题的社会生态方法:促进健康公平的行动呼吁。
Front Digit Health. 2022 Jul 18;4:897250. doi: 10.3389/fdgth.2022.897250. eCollection 2022.
9
Levenshtein Distance, Sequence Comparison and Biological Database Search.莱文斯坦距离、序列比较与生物数据库搜索。
IEEE Trans Inf Theory. 2021 Jun;67(6):3287-3294. doi: 10.1109/tit.2020.2996543. Epub 2020 May 21.
10
Named Entity Recognition and Relation Detection for Biomedical Information Extraction.用于生物医学信息提取的命名实体识别与关系检测
Front Cell Dev Biol. 2020 Aug 28;8:673. doi: 10.3389/fcell.2020.00673. eCollection 2020.