Center for Biomedical Informatics, Brown University, Providence RI.
The Warren Alpert Medical School, Brown University, Providence, RI.
AMIA Annu Symp Proc. 2022 Feb 21;2021:418-427. eCollection 2021.
Clinical notes are a rich source of biomedical data for natural language processing (NLP). The identification of note sections represents a first step in creating portable NLP tools. Here, a system that used a heterogeneous hidden Markov model (HMM) was designed to identify seven note sections: (1) Medical History, (2) Medications, (3) Family and Social History, (4) Physical Exam, (5) Labs and Imaging, (6) Assessment and Plan, and (7) Review of Systems. Unified Medical Language System (UMLS) concepts were identified using MetaMap, and UMLS semantic type distributions for each section type were empirically determined. The UMLS semantic type distributions were used to train the HMM for identifying clinical note sections. The system was evaluated relative to a template boundary model using manually annotated notes from the Medical Information Mart for Intensive Care III. The results show promise for an approach to segment clinical notes into sections for subsequent NLP tasks.
临床笔记是自然语言处理 (NLP) 的生物医学数据的丰富来源。注释部分的识别是创建可移植的 NLP 工具的第一步。在这里,设计了一个使用异构隐马尔可夫模型 (HMM) 的系统来识别七个注释部分:(1) 病史,(2) 药物,(3) 家族和社会史,(4) 体检,(5) 实验室和影像学,(6) 评估和计划,以及 (7) 系统回顾。使用 MetaMap 识别统一医学语言系统 (UMLS) 概念,并通过经验确定每个部分类型的 UMLS 语义类型分布。使用 UMLS 语义类型分布来训练 HMM 以识别临床笔记部分。该系统相对于使用从重症监护医疗信息集市 III 手动注释的模板边界模型进行了评估。结果表明,这种方法有望将临床笔记分割成后续 NLP 任务的部分。