使用自然语言处理特征工程和机器学习分类自动检测需要随访成像的放射学报告。

Automated Detection of Radiology Reports that Require Follow-up Imaging Using Natural Language Processing Feature Engineering and Machine Learning Classification.

机构信息

Perelman School of Medicine at the University of Pennsylvania, 801 S 24th St #3, Philadelphia, PA, 19146, USA.

Hospital of the University of Pennsylvania, Philadelphia, PA, USA.

出版信息

J Digit Imaging. 2020 Feb;33(1):131-136. doi: 10.1007/s10278-019-00271-7.

DOI:10.1007/s10278-019-00271-7

PMID:31482317

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7064732/

Abstract

While radiologists regularly issue follow-up recommendations, our preliminary research has shown that anywhere from 35 to 50% of patients who receive follow-up recommendations for findings of possible cancer on abdominopelvic imaging do not return for follow-up. As such, they remain at risk for adverse outcomes related to missed or delayed cancer diagnosis. In this study, we develop an algorithm to automatically detect free text radiology reports that have a follow-up recommendation using natural language processing (NLP) techniques and machine learning models. The data set used in this study consists of 6000 free text reports from the author's institution. NLP techniques are used to engineer 1500 features, which include the most informative unigrams, bigrams, and trigrams in the training corpus after performing tokenization and Porter stemming. On this data set, we train naive Bayes, decision tree, and maximum entropy models. The decision tree model, with an F1 score of 0.458 and accuracy of 0.862, outperforms both the naive Bayes (F1 score of 0.381) and maximum entropy (F1 score of 0.387) models. The models were analyzed to determine predictive features, with term frequency of n-grams such as "renal neoplasm" and "evalu with enhanc" being most predictive of a follow-up recommendation. Key to maximizing performance was feature engineering that extracts predictive information and appropriate selection of machine learning algorithms based on the feature set.

摘要

虽然放射科医生经常会提出随访建议，但我们的初步研究表明，在接受腹部和盆腔影像学检查结果可能为癌症的随访建议的患者中，有 35%至 50%的患者并未进行随访。因此，他们仍然存在因癌症漏诊或延误诊断而导致不良后果的风险。在这项研究中，我们开发了一种算法，使用自然语言处理（NLP）技术和机器学习模型自动检测具有随访建议的自由文本放射科报告。本研究使用的数据集包含作者所在机构的 6000 份自由文本报告。使用 NLP 技术对 1500 个特征进行了工程设计，这些特征包括在进行标记和 Porter 词干化后，训练语料库中最具信息量的单字、双字和三字。在这个数据集上，我们训练了朴素贝叶斯、决策树和最大熵模型。决策树模型的 F1 得分为 0.458，准确率为 0.862，优于朴素贝叶斯（F1 得分为 0.381）和最大熵（F1 得分为 0.387）模型。对这些模型进行了分析，以确定预测特征，其中“肾肿瘤”和“增强评估”等 n 元组的词频是预测随访建议的最具预测性特征。最大限度地提高性能的关键是特征工程，它可以提取预测信息，并根据特征集选择适当的机器学习算法。

相似文献

Automated Detection of Radiology Reports that Require Follow-up Imaging Using Natural Language Processing Feature Engineering and Machine Learning Classification.

J Digit Imaging. 2020 Feb;33(1):131-136. doi: 10.1007/s10278-019-00271-7.

Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports.

J Digit Imaging. 2018 Apr;31(2):178-184. doi: 10.1007/s10278-017-0027-x.

Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield.

AJR Am J Roentgenol. 2017 Apr;208(4):750-753. doi: 10.2214/AJR.16.16128. Epub 2017 Jan 31.

Natural language processing and machine learning algorithm to identify brain MRI reports with acute ischemic stroke.

PLoS One. 2019 Feb 28;14(2):e0212778. doi: 10.1371/journal.pone.0212778. eCollection 2019.

Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality Improvement.

Appl Clin Inform. 2016 Nov 9;7(4):1051-1068. doi: 10.4338/ACI-2016-08-RA-0129.

Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records.

Ups J Med Sci. 2020 Nov;125(4):316-324. doi: 10.1080/03009734.2020.1792010. Epub 2020 Jul 22.

Natural Language-based Machine Learning Models for the Annotation of Clinical Radiology Reports.

Radiology. 2018 May;287(2):570-580. doi: 10.1148/radiol.2018171093. Epub 2018 Jan 30.

Machine learning based natural language processing of radiology reports in orthopaedic trauma.

Comput Methods Programs Biomed. 2021 Sep;208:106304. doi: 10.1016/j.cmpb.2021.106304. Epub 2021 Jul 23.

Development of machine learning and natural language processing algorithms for preoperative prediction and automated identification of intraoperative vascular injury in anterior lumbar spine surgery.

Spine J. 2021 Oct;21(10):1635-1642. doi: 10.1016/j.spinee.2020.04.001. Epub 2020 Apr 12.

Ensemble Approaches to Recognize Protected Health Information in Radiology Reports.

J Digit Imaging. 2022 Dec;35(6):1694-1698. doi: 10.1007/s10278-022-00673-0. Epub 2022 Jun 17.

引用本文的文献

Automatic Abstraction of Computed Tomography Imaging Indication Using Natural Language Processing for Evaluation of Surveillance Patterns in Long-Term Lung Cancer Survivors.

JCO Clin Cancer Inform. 2025 Jul;9:e2400279. doi: 10.1200/CCI-24-00279. Epub 2025 Jul 23.

Comprehensive comparison of the third-generation sequencing tools for bacterial 6mA profiling.

Nat Commun. 2025 Apr 28;16(1):3982. doi: 10.1038/s41467-025-59187-2.

Privacy-Preserving Large Language Model for Matching Findings and Tracking Interval Changes in Longitudinal Radiology Reports.

J Imaging Inform Med. 2025 Apr 11. doi: 10.1007/s10278-025-01478-7.

Automated Detection of Cancer-Suspicious Findings in Japanese Radiology Reports with Natural Language Processing: A Multicenter Study.

J Imaging Inform Med. 2025 Jan 22. doi: 10.1007/s10278-024-01338-w.

Artificial Intelligence to Improve Patient Understanding of Radiology Reports.

Yale J Biol Med. 2023 Sep 29;96(3):407-417. doi: 10.59249/NKOY5498. eCollection 2023 Sep.

How Natural Language Processing Can Aid With Pulmonary Oncology Tumor Node Metastasis Staging From Free-Text Radiology Reports: Algorithm Development and Validation.

JMIR Form Res. 2023 Mar 22;7:e38125. doi: 10.2196/38125.

Artificial intelligence and machine learning in cancer imaging.

Commun Med (Lond). 2022 Oct 27;2:133. doi: 10.1038/s43856-022-00199-0. eCollection 2022.

Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing.

JCO Clin Cancer Inform. 2022 Jul;6:e2200006. doi: 10.1200/CCI.22.00006.

Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT).

BMC Med Inform Decis Mak. 2022 Jul 30;22(1):200. doi: 10.1186/s12911-022-01946-y.

The Use of BP Neural Network Algorithm and Natural Language Processing in the Impact of Social Audit on Enterprise Innovation Ability.

Comput Intell Neurosci. 2022 May 18;2022:7297769. doi: 10.1155/2022/7297769. eCollection 2022.

本文引用的文献

Quantitative Analysis of Uncertainty in Medical Reporting: Creating a Standardized and Objective Methodology.

J Digit Imaging. 2018 Apr;31(2):145-149. doi: 10.1007/s10278-017-0041-z.

Deep Learning to Classify Radiology Free-Text Reports.

Radiology. 2018 Mar;286(3):845-852. doi: 10.1148/radiol.2017171115. Epub 2017 Nov 13.

Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports.

J Digit Imaging. 2018 Apr;31(2):178-184. doi: 10.1007/s10278-017-0027-x.

Implementation of an Automated Radiology Recommendation-Tracking Engine for Abdominal Imaging Findings of Possible Cancer.

J Am Coll Radiol. 2017 May;14(5):629-636. doi: 10.1016/j.jacr.2017.01.024. Epub 2017 Mar 17.

Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield.

AJR Am J Roentgenol. 2017 Apr;208(4):750-753. doi: 10.2214/AJR.16.16128. Epub 2017 Jan 31.

Characterization of Change and Significance for Clinical Findings in Radiology Reports Through Natural Language Processing.

J Digit Imaging. 2017 Jun;30(3):314-322. doi: 10.1007/s10278-016-9931-8.

Natural Language Processing in Radiology: A Systematic Review.

Radiology. 2016 May;279(2):329-43. doi: 10.1148/radiol.16142770.

Code Abdomen: An Assessment Coding Scheme for Abdominal Imaging Findings Possibly Representing Cancer.

J Am Coll Radiol. 2015 Sep;12(9):947-50. doi: 10.1016/j.jacr.2015.04.005. Epub 2015 Jun 27.

Semi-supervised clinical text classification with Laplacian SVMs: an application to cancer case management.

J Biomed Inform. 2013 Oct;46(5):869-75. doi: 10.1016/j.jbi.2013.06.014. Epub 2013 Jul 8.

Automated detection using natural language processing of radiologists recommendations for additional imaging of incidental findings.

Ann Emerg Med. 2013 Aug;62(2):162-9. doi: 10.1016/j.annemergmed.2013.02.001. Epub 2013 Mar 30.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用自然语言处理特征工程和机器学习分类自动检测需要随访成像的放射学报告。

Automated Detection of Radiology Reports that Require Follow-up Imaging Using Natural Language Processing Feature Engineering and Machine Learning Classification.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献