用于对自由文本放射学报告中报告的脑转移瘤进行自动定量的自然语言处理

Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports.

作者信息

Senders Joeky T, Karhade Aditya V, Cote David J, Mehrtash Alireza, Lamba Nayan, DiRisio Aislyn, Muskens Ivo S, Gormley William B, Smith Timothy R, Broekman Marike L D, Arnaout Omar

机构信息

Brigham and Women's Hospital, Harvard Medical School, Boston, MA.

Haaglanden Medical Center, The Hague, the Netherlands.

出版信息

JCO Clin Cancer Inform. 2019 Apr;3:1-9. doi: 10.1200/CCI.18.00138.

DOI:10.1200/CCI.18.00138

PMID:31002562

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6873936/

Abstract

PURPOSE

Although the bulk of patient-generated health data are increasing exponentially, their use is impeded because most data come in unstructured format, namely as free-text clinical reports. A variety of natural language processing (NLP) methods have emerged to automate the processing of free text ranging from statistical to deep learning-based models; however, the optimal approach for medical text analysis remains to be determined. The aim of this study was to provide a head-to-head comparison of novel NLP techniques and inform future studies about their utility for automated medical text analysis.

PATIENTS AND METHODS

Magnetic resonance imaging reports of patients with brain metastases treated in two tertiary centers were retrieved and manually annotated using a binary classification (single metastasis two or more metastases). Multiple bag-of-words and sequence-based NLP models were developed and compared after randomly splitting the annotated reports into training and test sets in an 80:20 ratio.

RESULTS

A total of 1,479 radiology reports of patients diagnosed with brain metastases were retrieved. The least absolute shrinkage and selection operator (LASSO) regression model demonstrated the best overall performance on the hold-out test set with an area under the receiver operating characteristic curve of 0.92 (95% CI, 0.89 to 0.94), accuracy of 83% (95% CI, 80% to 87%), calibration intercept of -0.06 (95% CI, -0.14 to 0.01), and calibration slope of 1.06 (95% CI, 0.95 to 1.17).

CONCLUSION

Among various NLP techniques, the bag-of-words approach combined with a LASSO regression model demonstrated the best overall performance in extracting binary outcomes from free-text clinical reports. This study provides a framework for the development of machine learning-based NLP models as well as a clinical vignette of patients diagnosed with brain metastases.

摘要

目的

尽管患者生成的健康数据量正在呈指数级增长，但其应用却受到阻碍，因为大多数数据都是非结构化格式，即自由文本临床报告。各种自然语言处理（NLP）方法已经出现，用于自动化处理自由文本，从基于统计的模型到基于深度学习的模型；然而，医学文本分析的最佳方法仍有待确定。本研究的目的是对新型NLP技术进行直接比较，并为未来关于其在自动化医学文本分析中的效用的研究提供信息。

患者与方法

检索了在两个三级中心接受治疗的脑转移瘤患者的磁共振成像报告，并使用二元分类（单个转移瘤两个或更多转移瘤）进行人工注释。在将注释后的报告以80:20的比例随机分为训练集和测试集后，开发并比较了多个词袋模型和基于序列的NLP模型。

结果

共检索到1479份诊断为脑转移瘤患者的放射学报告。最小绝对收缩和选择算子（LASSO）回归模型在保留测试集上表现出最佳的整体性能，受试者操作特征曲线下面积为0.92（95%CI，0.89至0.94），准确率为83%（95%CI，80%至87%），校准截距为-0.06（95%CI，-0.14至0.01），校准斜率为1.06（95%CI，0.95至1.17）。

结论

在各种NLP技术中，词袋方法与LASSO回归模型相结合在从自由文本临床报告中提取二元结果方面表现出最佳的整体性能。本研究为基于机器学习的NLP模型的开发提供了一个框架，以及诊断为脑转移瘤患者的临床案例。

相似文献

Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports.

JCO Clin Cancer Inform. 2019 Apr;3:1-9. doi: 10.1200/CCI.18.00138.

Prediction of Stroke Outcome Using Natural Language Processing-Based Machine Learning of Radiology Report of Brain MRI.

J Pers Med. 2020 Dec 16;10(4):286. doi: 10.3390/jpm10040286.

Natural language processing for automated quantification of bone metastases reported in free-text bone scintigraphy reports.

Acta Oncol. 2020 Dec;59(12):1455-1460. doi: 10.1080/0284186X.2020.1819563. Epub 2020 Sep 12.

Automating Clinical Chart Review: An Open-Source Natural Language Processing Pipeline Developed on Free-Text Radiology Reports From Patients With Glioblastoma.

JCO Clin Cancer Inform. 2020 Jan;4:25-34. doi: 10.1200/CCI.19.00060.

Discerning tumor status from unstructured MRI reports--completeness of information in existing reports and utility of automated natural language processing.

J Digit Imaging. 2010 Apr;23(2):119-32. doi: 10.1007/s10278-009-9215-7. Epub 2009 May 30.

Natural language processing of head CT reports to identify intracranial mass effect: CTIME algorithm.

Am J Emerg Med. 2022 Jan;51:388-392. doi: 10.1016/j.ajem.2021.11.001. Epub 2021 Nov 9.

Transformer versus traditional natural language processing: how much data is enough for automated radiology report classification?

Br J Radiol. 2023 Sep;96(1149):20220769. doi: 10.1259/bjr.20220769. Epub 2023 May 25.

Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield.

AJR Am J Roentgenol. 2017 Apr;208(4):750-753. doi: 10.2214/AJR.16.16128. Epub 2017 Jan 31.

Can We Geographically Validate a Natural Language Processing Algorithm for Automated Detection of Incidental Durotomy Across Three Independent Cohorts From Two Continents?

Clin Orthop Relat Res. 2022 Sep 1;480(9):1766-1775. doi: 10.1097/CORR.0000000000002200. Epub 2022 Apr 12.

Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula.

Rofo. 2023 Aug;195(8):713-719. doi: 10.1055/a-2061-6562. Epub 2023 May 9.

引用本文的文献

Classifying Stereotactic Radiosurgery Patients by Primary Diagnosis Using Natural Language Processing of Clinical Notes.

JCO Clin Cancer Inform. 2025 Jun;9:e2400268. doi: 10.1200/CCI-24-00268. Epub 2025 Jun 13.

Natural Language Processing of Radiology Reports to Assess Survival in Patients with Advanced Melanoma.

Cancers (Basel). 2025 May 7;17(9):1595. doi: 10.3390/cancers17091595.

Using Generative AI to Extract Structured Information from Free Text Pathology Reports.

J Med Syst. 2025 Mar 13;49(1):36. doi: 10.1007/s10916-025-02167-2.

Text mining of verbal autopsy narratives to extract mortality causes and most prevalent diseases using natural language processing.

PLoS One. 2024 Sep 19;19(9):e0308452. doi: 10.1371/journal.pone.0308452. eCollection 2024.

Automatic Detection of Distant Metastasis Mentions in Radiology Reports in Spanish.

JCO Clin Cancer Inform. 2024 Jan;8:e2300130. doi: 10.1200/CCI.23.00130.

Transformer versus traditional natural language processing: how much data is enough for automated radiology report classification?

Br J Radiol. 2023 Sep;96(1149):20220769. doi: 10.1259/bjr.20220769. Epub 2023 May 25.

Natural Language Processing and Graph Theory: Making Sense of Imaging Records in a Novel Representation Frame.

JMIR Med Inform. 2022 Dec 21;10(12):e40534. doi: 10.2196/40534.

Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing.

JCO Clin Cancer Inform. 2022 Jul;6:e2200006. doi: 10.1200/CCI.22.00006.

Developing a Cancer Digital Twin: Supervised Metastases Detection From Consecutive Structured Radiology Reports.

Front Artif Intell. 2022 Mar 2;5:826402. doi: 10.3389/frai.2022.826402. eCollection 2022.

Foundations of Machine Learning-Based Clinical Prediction Modeling: Part V-A Practical Approach to Regression Problems.

Acta Neurochir Suppl. 2022;134:43-50. doi: 10.1007/978-3-030-85292-4_6.

本文引用的文献

Automated Extraction of Grade, Stage, and Quality Information From Transurethral Resection of Bladder Tumor Pathology Reports Using Natural Language Processing.

JCO Clin Cancer Inform. 2018 Dec;2:1-8. doi: 10.1200/CCI.17.00128.

Comparison of Natural Language Processing and Manual Coding for the Identification of Cross-Sectional Imaging Reports Suspicious for Lung Cancer.

JCO Clin Cancer Inform. 2018 Dec;2:1-7. doi: 10.1200/CCI.17.00069.

Intelligent Word Embeddings of Free-Text Radiology Reports.

AMIA Annu Symp Proc. 2018 Apr 16;2017:411-420. eCollection 2017.

Automating the Determination of Prostate Cancer Risk Strata From Electronic Medical Records.

JCO Clin Cancer Inform. 2017;1. doi: 10.1200/CCI.16.00045. Epub 2017 Jun 8.

Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives.

PLoS One. 2018 Feb 15;13(2):e0192360. doi: 10.1371/journal.pone.0192360. eCollection 2018.

Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports.

J Digit Imaging. 2018 Apr;31(2):178-184. doi: 10.1007/s10278-017-0027-x.

Using Naïve Bayesian Analysis to Determine Imaging Characteristics of KRAS Mutations in Metastatic Colon Cancer.

Diagnostics (Basel). 2017 Sep 2;7(3):50. doi: 10.3390/diagnostics7030050.

Applying deep neural networks to unstructured text notes in electronic medical records for phenotyping youth depression.

Evid Based Ment Health. 2017 Aug;20(3):83-87. doi: 10.1136/eb-2017-102688. Epub 2017 Jul 24.

Convolutional Neural Networks for Biomedical Text Classification: Application in Indexing Biomedical Articles.

ACM BCB. 2015 Sep;2015:258-267. doi: 10.1145/2808719.2808746.

-Gram-Based Text Compression.

Comput Intell Neurosci. 2016;2016:9483646. doi: 10.1155/2016/9483646. Epub 2016 Nov 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于对自由文本放射学报告中报告的脑转移瘤进行自动定量的自然语言处理

Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports.

作者信息

机构信息

出版信息

PURPOSE

PATIENTS AND METHODS

RESULTS

CONCLUSION

目的

患者与方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献