Sevenster Merlijn, Bozeman Jeffrey, Cowhy Andrea, Trost William
Philips Research North America, Briarcliff Manor, NY.
University of Chicago, Chicago, IL.
AMIA Annu Symp Proc. 2013 Nov 16;2013:1262-71. eCollection 2013.
Radiological measurements are one of the key variables in widely adopted guidelines (WHO, RECIST) that standardize and objectivize response assessment in oncology care. Measurements are typically described in free-text, narrative radiology reports. We present a natural language processing pipeline that extracts measurements from radiology reports and pairs them with extracted measurements from prior reports of the same clinical finding, e.g., lymph node or mass. A ground truth was created by manually pairing measurements in the abdomen CT reports of 50 patients. A Random Forest classifier trained on 15 features achieved superior results in an end-to-end evaluation of the pipeline on the extraction and pairing task: precision 0.910, recall 0.878, F-measure 0.894, AUC 0.988. Representing the narrative content in terms of UMLS concepts did not improve results. Applications of the proposed technology include data mining, advanced search and workflow support for healthcare professionals managing radiological measurements.
放射学测量是广泛采用的指南(世界卫生组织、实体瘤疗效评价标准)中的关键变量之一,这些指南使肿瘤护理中的反应评估标准化并客观化。测量结果通常在自由文本的叙述性放射学报告中描述。我们提出了一种自然语言处理流程,该流程从放射学报告中提取测量结果,并将其与来自相同临床发现(例如淋巴结或肿块)的先前报告中提取的测量结果进行配对。通过手动配对50名患者腹部CT报告中的测量结果创建了一个真值。在15个特征上训练的随机森林分类器在该流程对提取和配对任务的端到端评估中取得了优异的结果:精确率0.910、召回率0.878、F值0.894、曲线下面积0.988。用统一医学语言系统概念表示叙述内容并没有改善结果。所提出技术的应用包括数据挖掘、高级搜索以及为管理放射学测量的医疗保健专业人员提供工作流程支持。