Division of Vascular and Interventional Radiology, Department of Medical Imaging, Toronto General Hospital-University Health Network/University of Toronto, Toronto, Canada.
Division of Vascular and Interventional Radiology, Department of Medical Imaging, Toronto General Hospital-University Health Network/University of Toronto, Toronto, Canada.
J Am Coll Radiol. 2019 Jun;16(6):840-844. doi: 10.1016/j.jacr.2018.12.004. Epub 2019 Mar 2.
Radiology is a finite health care resource in high demand at most health centers. However, anticipating fluctuations in demand is a challenge because of the inherent uncertainty in disease prognosis. The aim of this study was to explore the potential of natural language processing (NLP) to predict downstream radiology resource utilization in patients undergoing surveillance for hepatocellular carcinoma (HCC).
All HCC surveillance CT examinations performed at our institution from January 1, 2010, to October 31, 2017 were selected from our departmental radiology information system. We used open source NLP and machine learning software to parse radiology report text into bag-of-words and term frequency-inverse document frequency (TF-IDF) representations. Three machine learning models-logistic regression, support vector machine (SVM), and random forest-were used to predict future utilization of radiology department resources. A test data set was used to calculate accuracy, sensitivity, and specificity in addition to the area under the curve (AUC).
As a group, the bag-of-word models were slightly inferior to the TF-IDF feature extraction approach. The TF-IDF + SVM model outperformed all other models with an accuracy of 92%, a sensitivity of 83%, and a specificity of 96%, with an AUC of 0.971.
NLP-based models can accurately predict downstream radiology resource utilization from narrative HCC surveillance reports and has potential for translation to health care management where it may improve decision making, reduce costs, and broaden access to care.
放射学是大多数医疗中心需求极高的有限医疗资源。然而,由于疾病预后的固有不确定性,预测需求波动是一项挑战。本研究旨在探讨自然语言处理(NLP)在预测接受肝细胞癌(HCC)监测的患者下游放射学资源利用方面的潜力。
从我们部门的放射信息系统中选择了 2010 年 1 月 1 日至 2017 年 10 月 31 日在我院进行的所有 HCC 监测 CT 检查。我们使用开源 NLP 和机器学习软件将放射学报告文本解析为单词袋和词频逆文档频率(TF-IDF)表示。使用三种机器学习模型-逻辑回归、支持向量机(SVM)和随机森林-来预测未来放射科资源的利用情况。使用测试数据集计算准确性、敏感性和特异性以及曲线下面积(AUC)。
作为一个整体,单词袋模型略逊于 TF-IDF 特征提取方法。TF-IDF+SVM 模型的准确性为 92%,敏感性为 83%,特异性为 96%,AUC 为 0.971,优于所有其他模型。
基于 NLP 的模型可以从 HCC 监测报告的叙述中准确预测下游放射学资源的利用情况,并有可能转化为医疗保健管理,从而改善决策、降低成本并扩大获得护理的机会。