Kim Youngjun, Garvin Jennifer H, Goldstein Mary K, Hwang Tammy S, Redd Andrew, Bolton Dan, Heidenreich Paul A, Meystre Stéphane M
School of Computing, University of Utah, Salt Lake City, UT, USA; VA Health Care System, Salt Lake City, UT, USA.
VA Health Care System, Salt Lake City, UT, USA; Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA.
J Biomed Inform. 2017 Mar;67:42-48. doi: 10.1016/j.jbi.2017.01.017. Epub 2017 Feb 2.
Efforts to improve the treatment of congestive heart failure, a common and serious medical condition, include the use of quality measures to assess guideline-concordant care. The goal of this study is to identify left ventricular ejection fraction (LVEF) information from various types of clinical notes, and to then use this information for heart failure quality measurement. We analyzed the annotation differences between a new corpus of clinical notes from the Echocardiography, Radiology, and Text Integrated Utility package and other corpora annotated for natural language processing (NLP) research in the Department of Veterans Affairs. These reports contain varying degrees of structure. To examine whether existing LVEF extraction modules we developed in prior research improve the accuracy of LVEF information extraction from the new corpus, we created two sequence-tagging NLP modules trained with a new data set, with or without predictions from the existing LVEF extraction modules. We also conducted a set of experiments to examine the impact of training data size on information extraction accuracy. We found that less training data is needed when reports are highly structured, and that combining predictions from existing LVEF extraction modules improves information extraction when reports have less structured formats and a rich set of vocabulary.
改善充血性心力衰竭(一种常见且严重的病症)治疗效果的努力包括使用质量指标来评估符合指南的护理。本研究的目的是从各类临床记录中识别左心室射血分数(LVEF)信息,然后将该信息用于心力衰竭质量评估。我们分析了来自超声心动图、放射学和文本综合实用程序包的新临床记录语料库与退伍军人事务部为自然语言处理(NLP)研究注释的其他语料库之间的注释差异。这些报告具有不同程度的结构。为了检验我们在先前研究中开发的现有LVEF提取模块是否能提高从新语料库中提取LVEF信息的准确性,我们创建了两个用新数据集训练的序列标记NLP模块,一个有现有LVEF提取模块的预测,另一个没有。我们还进行了一组实验,以检验训练数据量对信息提取准确性的影响。我们发现,当报告结构高度规整时,所需的训练数据较少;当报告格式结构较少且词汇丰富时,结合现有LVEF提取模块的预测可提高信息提取效果。