Nakayama Jasmine Y, Hertzberg Vicki, Ho Joyce C
Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GA.
Department of Computer Science, Emory University, Atlanta, GA.
AMIA Jt Summits Transl Sci Proc. 2019 May 6;2019:275-284. eCollection 2019.
Unstructured data from electronic health records hold potential for improving predictive models for health outcomes. Efforts to extract structured information from the unstructured data used text mining methodologies, such as topic modeling and sentiment analysis. However, such methods do not account for abbreviations. Nursing notes have valuable information about nurses' assessments and interventions, and the abbreviation use is common. Thus, abbreviation disambiguation may add more insight when using unstructured text for predictive modeling. We present a new process to extract structured information from nursing notes through abbreviation normalization, lemmatization, and stop word removal. Our study found that abbreviation disambiguation in nursing notes for subsequent topic modeling and sentiment analysis improved prediction of in-hospital and 30-day mortality while controlling for comorbidity.
电子健康记录中的非结构化数据具有改善健康结果预测模型的潜力。从非结构化数据中提取结构化信息的工作采用了文本挖掘方法,如主题建模和情感分析。然而,这些方法并未考虑缩写情况。护理记录包含有关护士评估和干预措施的宝贵信息,且缩写的使用很常见。因此,在使用非结构化文本进行预测建模时,缩写消歧可能会带来更多见解。我们提出了一种新的流程,通过缩写规范化、词形还原和停用词去除,从护理记录中提取结构化信息。我们的研究发现,在护理记录中进行缩写消歧以用于后续的主题建模和情感分析,在控制合并症的同时,改善了对住院期间和30天死亡率的预测。