Harvard Medical School, Boston, Massachusetts; Center for Evidence-Based Imaging, Department of Radiology, Brigham and Women's Hospital, Brookline, Massachusetts.
Harvard Medical School, Boston, Massachusetts; Center for Evidence-Based Imaging, Department of Radiology, Brigham and Women's Hospital, Brookline, Massachusetts.
J Am Coll Radiol. 2019 Mar;16(3):336-343. doi: 10.1016/j.jacr.2018.10.020. Epub 2018 Dec 29.
The aims of this study were to assess follow-up recommendations in radiology reports, develop and assess traditional machine learning (TML) and deep learning (DL) models in identifying follow-up, and benchmark them against a natural language processing (NLP) system.
This HIPAA-compliant, institutional review board-approved study was performed at an academic medical center generating >500,000 radiology reports annually. One thousand randomly selected ultrasound, radiography, CT, and MRI reports generated in 2016 were manually reviewed and annotated for follow-up recommendations. TML (support vector machines, random forest, logistic regression) and DL (recurrent neural nets) algorithms were constructed and trained on 850 reports (training data), with subsequent optimization of model architectures and parameters. Precision, recall, and F1 score were calculated on the remaining 150 reports (test data). A previously developed and validated NLP system (iSCOUT) was also applied to the test data, with equivalent metrics calculated.
Follow-up recommendations were present in 12.7% of reports. The TML algorithms achieved F1 scores of 0.75 (random forest), 0.83 (logistic regression), and 0.85 (support vector machine) on the test data. DL recurrent neural nets had an F1 score of 0.71; iSCOUT also had an F1 score of 0.71. Performance of both TML and DL methods by F1 scores appeared to plateau after 500 to 700 samples while training.
TML and DL are feasible methods to identify follow-up recommendations. These methods have great potential for near real-time monitoring of follow-up recommendations in radiology reports.
本研究旨在评估放射学报告中的随访建议,开发和评估传统机器学习(TML)和深度学习(DL)模型在识别随访方面的应用,并与自然语言处理(NLP)系统进行基准比较。
本 HIPAA 合规、机构审查委员会批准的研究在一家学术医疗中心进行,该中心每年生成超过 500,000 份放射学报告。2016 年随机选择了 1000 份超声、放射线照相、CT 和 MRI 报告进行手动审查和注释,以确定随访建议。TML(支持向量机、随机森林、逻辑回归)和 DL(递归神经网络)算法在 850 份报告(训练数据)上构建和训练,随后对模型架构和参数进行了优化。在剩余的 150 份报告(测试数据)上计算了精度、召回率和 F1 分数。还将先前开发和验证的 NLP 系统(iSCOUT)应用于测试数据,并计算了等效指标。
报告中存在随访建议的比例为 12.7%。TML 算法在测试数据上的 F1 分数分别为随机森林 0.75、逻辑回归 0.83 和支持向量机 0.85。DL 递归神经网络的 F1 分数为 0.71;iSCOUT 的 F1 分数也为 0.71。通过 F1 分数评估,TML 和 DL 方法的性能似乎在训练 500 到 700 个样本后达到平台期。
TML 和 DL 是识别随访建议的可行方法。这些方法具有在放射学报告中实时监测随访建议的巨大潜力。