Landes Paul, Patel Kunal, Huang Sean S, Webb Adam, Eugenio Barbara Di, Caragea Cornelia
Department of Computer Science, University of Illinois at Chicago.
Department of Emergency Medicine, University of Illinois at Chicago.
Proc Int Conf Comput Ling. 2022 Oct;2022:3709-3721.
The process by which sections in a document are demarcated and labeled is known as section identification. Such sections are helpful to the reader when searching for information and contextualizing specific topics. The goal of this work is to segment the sections of clinical medical domain documentation. The primary contribution of this work is MedSecId, a publicly available set of 2,002 fully annotated medical notes from the MIMIC-III. We include several baselines, source code, a pretrained model and analysis of the data showing a relationship between medical concepts across sections using principal component analysis.
文档中各部分的划分和标记过程称为章节识别。当读者搜索信息并将特定主题置于上下文时,这些章节会对其有所帮助。这项工作的目标是对临床医学领域文档的章节进行分割。这项工作的主要贡献是MedSecId,这是一组可公开获取的、来自MIMIC-III的2002份完整注释的医疗记录。我们纳入了几个基线、源代码、一个预训练模型以及使用主成分分析对数据进行的分析,该分析显示了各章节间医学概念的关系。