利用本体和弱监督从临床记录中识别罕见病。

Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision.

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2294-2298. doi: 10.1109/EMBC46164.2021.9630043.

DOI:10.1109/EMBC46164.2021.9630043

Abstract

The identification of rare diseases from clinical notes with Natural Language Processing (NLP) is challenging due to the few cases available for machine learning and the need of data annotation from clinical experts. We propose a method using ontologies and weak supervision. The approach includes two steps: (i) Text-to-UMLS, linking text mentions to concepts in Unified Medical Language System (UMLS), with a named entity linking tool (e.g. SemEHR) and weak supervision based on customised rules and Bidirectional Encoder Representations from Transformers (BERT) based contextual representations, and (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). Using MIMIC-III US intensive care discharge summaries as a case study, we show that the Text-to-UMLS process can be greatly improved with weak supervision, without any annotated data from domain experts. Our analysis shows that the overall pipeline processing discharge summaries can surface rare disease cases, which are mostly uncaptured in manual ICD codes of the hospital admissions.

摘要

使用自然语言处理（NLP）从临床记录中识别罕见疾病具有挑战性，因为机器学习可用的病例很少，并且需要临床专家进行数据标注。我们提出了一种使用本体和弱监督的方法。该方法包括两个步骤：（i）文本到 UMLS，将文本提及与统一医学语言系统（UMLS）中的概念联系起来，使用命名实体链接工具（例如 SemEHR）和基于自定义规则和基于转换器的双向编码器表示（BERT）的弱监督基于上下文的表示，以及（ii）UMLS 到 ORDO，将 UMLS 概念与孤儿疾病数据库（Orphanet Rare Disease Ontology，ORDO）中的罕见疾病相匹配。我们使用 MIMIC-III 美国重症监护病房出院记录作为案例研究，表明弱监督可以大大改进 Text-to-UMLS 过程，而无需任何来自领域专家的注释数据。我们的分析表明，整个管道处理出院记录可以发现罕见疾病病例，而这些病例在医院入院的手动 ICD 代码中大多未被捕获。

相似文献

Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision.利用本体和弱监督从临床记录中识别罕见病。

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2294-2298. doi: 10.1109/EMBC46164.2021.9630043.

Ontology-driven and weakly supervised rare disease identification from clinical notes.基于本体的临床笔记辅助下的弱监督罕见病识别。

BMC Med Inform Decis Mak. 2023 May 5;23(1):86. doi: 10.1186/s12911-023-02181-9.

A hybrid framework with large language models for rare disease phenotyping.基于大语言模型的罕见病表型分析混合框架。

BMC Med Inform Decis Mak. 2024 Oct 8;24(1):289. doi: 10.1186/s12911-024-02698-7.

Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)-based ranking for concept normalization.统一医学语言系统资源提高了基于筛子的生成和基于双向编码器表示的转换器（BERT）的排名，以实现概念归一化。

J Am Med Inform Assoc. 2020 Oct 1;27(10):1510-1519. doi: 10.1093/jamia/ocaa080.

Classifying the lifestyle status for Alzheimer's disease from clinical notes using deep learning with weak supervision.使用基于弱监督的深度学习对临床笔记进行阿尔茨海默病生活方式状况分类。

BMC Med Inform Decis Mak. 2022 Jul 7;22(Suppl 1):88. doi: 10.1186/s12911-022-01819-4.

An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontology-Enhanced Large Language Models: Development Study.基于本体增强大语言模型的罕见病知识图谱构建自动端到端系统：开发研究

JMIR Med Inform. 2024 Dec 18;12:e60665. doi: 10.2196/60665.

Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.基于机器学习的自然语言处理方法对临床笔记进行医学子域分类。

BMC Med Inform Decis Mak. 2017 Dec 1;17(1):155. doi: 10.1186/s12911-017-0556-8.

Predicting mortality in critically ill patients with diabetes using machine learning and clinical notes.使用机器学习和临床记录预测危重症糖尿病患者的死亡率。

BMC Med Inform Decis Mak. 2020 Dec 30;20(Suppl 11):295. doi: 10.1186/s12911-020-01318-4.

Identification of asthma control factor in clinical notes using a hybrid deep learning model.使用混合深度学习模型从临床记录中识别哮喘控制因素。

BMC Med Inform Decis Mak. 2021 Nov 9;21(Suppl 7):272. doi: 10.1186/s12911-021-01633-4.

Autonomous International Classification of Diseases Coding Using Pretrained Language Models and Advanced Prompt Learning Techniques: Evaluation of an Automated Analysis System Using Medical Text.使用预训练语言模型和先进提示学习技术的自主国际疾病分类编码：对一个使用医学文本的自动分析系统的评估

JMIR Med Inform. 2025 Jan 6;13:e63020. doi: 10.2196/63020.

引用本文的文献

A labeled medical records corpus for the timely detection of rare diseases using machine learning approaches.一个用于使用机器学习方法及时检测罕见疾病的带标签医疗记录语料库。

Sci Rep. 2025 Feb 26;15(1):6932. doi: 10.1038/s41598-025-90450-0.

Weakly supervised spatial relation extraction from radiology reports.从放射学报告中进行弱监督空间关系提取。

JAMIA Open. 2023 Apr 22;6(2):ooad027. doi: 10.1093/jamiaopen/ooad027. eCollection 2023 Jul.

An overview of biomedical entity linking throughout the years.生物医学实体链接概述。

J Biomed Inform. 2023 Jan;137:104252. doi: 10.1016/j.jbi.2022.104252. Epub 2022 Dec 2.

Automated clinical coding: what, why, and where we are?自动化临床编码：是什么、为什么以及我们目前的进展？

NPJ Digit Med. 2022 Oct 22;5(1):159. doi: 10.1038/s41746-022-00705-7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用本体和弱监督从临床记录中识别罕见病。

Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献