Suppr超能文献

利用本体和弱监督从临床记录中识别罕见病。

Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision.

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2294-2298. doi: 10.1109/EMBC46164.2021.9630043.

Abstract

The identification of rare diseases from clinical notes with Natural Language Processing (NLP) is challenging due to the few cases available for machine learning and the need of data annotation from clinical experts. We propose a method using ontologies and weak supervision. The approach includes two steps: (i) Text-to-UMLS, linking text mentions to concepts in Unified Medical Language System (UMLS), with a named entity linking tool (e.g. SemEHR) and weak supervision based on customised rules and Bidirectional Encoder Representations from Transformers (BERT) based contextual representations, and (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). Using MIMIC-III US intensive care discharge summaries as a case study, we show that the Text-to-UMLS process can be greatly improved with weak supervision, without any annotated data from domain experts. Our analysis shows that the overall pipeline processing discharge summaries can surface rare disease cases, which are mostly uncaptured in manual ICD codes of the hospital admissions.

摘要

使用自然语言处理(NLP)从临床记录中识别罕见疾病具有挑战性,因为机器学习可用的病例很少,并且需要临床专家进行数据标注。我们提出了一种使用本体和弱监督的方法。该方法包括两个步骤:(i)文本到 UMLS,将文本提及与统一医学语言系统(UMLS)中的概念联系起来,使用命名实体链接工具(例如 SemEHR)和基于自定义规则和基于转换器的双向编码器表示(BERT)的弱监督基于上下文的表示,以及(ii)UMLS 到 ORDO,将 UMLS 概念与孤儿疾病数据库(Orphanet Rare Disease Ontology,ORDO)中的罕见疾病相匹配。我们使用 MIMIC-III 美国重症监护病房出院记录作为案例研究,表明弱监督可以大大改进 Text-to-UMLS 过程,而无需任何来自领域专家的注释数据。我们的分析表明,整个管道处理出院记录可以发现罕见疾病病例,而这些病例在医院入院的手动 ICD 代码中大多未被捕获。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验