Divita Guy, Luo Gang, Tran Le-Thuy T, Workman T Elizabeth, Gundlapalli Adi V, Samore Matthew H
VA Salt Lake City Health Care System, Salt Lake City, Utah, USA.
Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, USA.
Stud Health Technol Inform. 2017;245:356-360.
There is need for cataloging signs and symptoms, but not all are documented in structured data. The text from clinical records are an additional source of signs and symptoms. We describe a Natural Language Processing (NLP) technique to identify symptoms from text. Using a human-annotated reference corpus from VA electronic medical notes we trained and tested an NLP pipeline to identify and categorize symptoms. The technique includes a model created from an automatic machine learning model selection tool. Tested on a hold-out set, its precision at the mention level was 0.80, recall 0.74 and an overall f-score of 0.80. The tool was scaled-up to process a large corpus of 964,105 patient records.
需要对体征和症状进行编目,但并非所有体征和症状都记录在结构化数据中。临床记录中的文本是体征和症状的另一个来源。我们描述了一种自然语言处理(NLP)技术,用于从文本中识别症状。我们使用来自退伍军人事务部电子病历的人工标注参考语料库,训练并测试了一个NLP管道,以识别症状并进行分类。该技术包括一个由自动机器学习模型选择工具创建的模型。在一个保留集上进行测试,其提及级别的精确率为0.80,召回率为0.74,总体F值为0.80。该工具已扩大规模,以处理包含964,105份患者记录的大型语料库。