Steinkamp Jackson M, Chambers Charles M, Lalevic Darco, Zafar Hanna M, Cook Tessa S
Department of Radiology, Hospital of the University of Pennsylvania, 3400 Spruce St, Philadelphia, PA 19104 (J.M.S., C.M.C., D.L., H.M.Z., T.S.C.); and Department of Radiology, Boston University School of Medicine, Boston, Mass (J.M.S.).
Radiol Artif Intell. 2019 Aug 7;1(5):e180052. doi: 10.1148/ryai.2019180052. eCollection 2019 Sep.
To evaluate the performance of machine learning algorithms on organ-level classification of semistructured pathology reports, to incorporate surgical pathology monitoring into an automated imaging recommendation follow-up engine.
This retrospective study included 2013 pathology reports from patients who underwent abdominal imaging at a large tertiary care center between 2012 and 2018. The reports were labeled by two annotators as relevant to four abdominal organs: liver, kidneys, pancreas and/or adrenal glands, or none. Automated classification methods were compared: simple string matching, random forests, extreme gradient boosting, support vector machines, and two neural network architectures-convolutional neural networks and long short-term memory networks. Three methods from the literature were used to provide interpretability and qualitative validation of the learned network features.
The neural networks performed well on the four-organ classification task (F1 score: 96.3% for convolutional neural network and 96.7% for long short-term memory vs 89.9% for support vector machines, 93.9% for extreme gradient boosting, 82.8% for random forests, and 75.2% for simple string matching). Multiple methods were used to visualize the decision-making process of the network, verifying that the networks used similar heuristics to a human annotator. The neural networks were able to classify, with a high degree of accuracy, pathology reports written in unseen formats, suggesting the networks had learned a generalizable encoding of the salient features.
Neural network-based approaches achieve high performance on organ-level pathology report classification, suggesting that it is feasible to use them within automated tracking systems.© RSNA, 2019See also the commentary by Liu in this issue.
评估机器学习算法在半结构化病理报告器官水平分类中的性能,将手术病理监测纳入自动化影像推荐随访引擎。
这项回顾性研究纳入了2013份病理报告,这些报告来自2012年至2018年期间在一家大型三级医疗中心接受腹部影像检查的患者。两名注释者将这些报告标记为与四个腹部器官相关:肝脏、肾脏、胰腺和/或肾上腺,或与这些器官均无关。比较了多种自动分类方法:简单字符串匹配、随机森林、极端梯度提升、支持向量机,以及两种神经网络架构——卷积神经网络和长短期记忆网络。采用文献中的三种方法对学习到的网络特征进行解释和定性验证。
神经网络在四器官分类任务中表现良好(F1分数:卷积神经网络为96.3%,长短期记忆网络为96.7%,而支持向量机为89.9%,极端梯度提升为93.9%,随机森林为82.8%,简单字符串匹配为75.2%)。使用多种方法可视化网络的决策过程,验证了网络使用了与人类注释者相似的启发式方法。神经网络能够高度准确地对以未见格式书写的病理报告进行分类,这表明网络已经学习到了显著特征的可通用编码。
基于神经网络的方法在器官水平病理报告分类中取得了高性能,这表明在自动化跟踪系统中使用这些方法是可行的。©RSNA,2019另见本期Liu的评论。