Topaz Maxim, Murga Ludmila, Bar-Bachar Ofrit, Cato Kenrick, Collins Sarah
School of Nursing, Columbia University, New York City, NY, USA.
Data Science Institute, Columbia University, New York City, NY, USA.
Stud Health Technol Inform. 2019 Aug 21;264:1056-1060. doi: 10.3233/SHTI190386.
We applied an open source natural language processing (NLP) system "NimbleMiner" to identify clinical notes with mentions of alcohol and substance abuse. NimbleMiner allows users to rapidly discover clinical vocabularies (using word embedding model) and then implement machine learning for text classification. We used a large inpatient dataset with over 50,000 intensive care unit admissions (MIMIC II). Clinical notes included physician-written discharge summaries (n = 51,201) and nursing notes (n = 412,343). We first used physician-written discharge summaries to train the system's algorithm and then added nursing notes to the physician-written discharge summaries and evaluated algorithms prediction accuracy. Adding nursing notes to the physician-written discharge summaries resulted in almost two-fold vocabulary expansion. NimbleMiner slightly outperformed other state-of-the-art NLP systems (average F-score = .84), while requiring significantly less time for the algorithms development.: Our findings underline the importance of nursing data for the analysis of electronic patient records.
我们应用了一个开源自然语言处理(NLP)系统“NimbleMiner”来识别提及酒精和药物滥用的临床记录。NimbleMiner允许用户快速发现临床词汇(使用词嵌入模型),然后实施机器学习进行文本分类。我们使用了一个包含超过50000例重症监护病房入院病例的大型住院数据集(MIMIC II)。临床记录包括医生撰写的出院小结(n = 51201)和护理记录(n = 412343)。我们首先使用医生撰写的出院小结来训练系统的算法,然后将护理记录添加到医生撰写的出院小结中,并评估算法的预测准确性。将护理记录添加到医生撰写的出院小结中导致词汇量几乎扩大了两倍。NimbleMiner的表现略优于其他最先进的NLP系统(平均F值 = 0.84),同时算法开发所需时间显著更少。我们的研究结果强调了护理数据对电子病历分析的重要性。