Suppr超能文献

用于对自由文本放射学报告中报告的脑转移瘤进行自动定量的自然语言处理

Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports.

作者信息

Senders Joeky T, Karhade Aditya V, Cote David J, Mehrtash Alireza, Lamba Nayan, DiRisio Aislyn, Muskens Ivo S, Gormley William B, Smith Timothy R, Broekman Marike L D, Arnaout Omar

机构信息

Brigham and Women's Hospital, Harvard Medical School, Boston, MA.

Haaglanden Medical Center, The Hague, the Netherlands.

出版信息

JCO Clin Cancer Inform. 2019 Apr;3:1-9. doi: 10.1200/CCI.18.00138.

Abstract

PURPOSE

Although the bulk of patient-generated health data are increasing exponentially, their use is impeded because most data come in unstructured format, namely as free-text clinical reports. A variety of natural language processing (NLP) methods have emerged to automate the processing of free text ranging from statistical to deep learning-based models; however, the optimal approach for medical text analysis remains to be determined. The aim of this study was to provide a head-to-head comparison of novel NLP techniques and inform future studies about their utility for automated medical text analysis.

PATIENTS AND METHODS

Magnetic resonance imaging reports of patients with brain metastases treated in two tertiary centers were retrieved and manually annotated using a binary classification (single metastasis two or more metastases). Multiple bag-of-words and sequence-based NLP models were developed and compared after randomly splitting the annotated reports into training and test sets in an 80:20 ratio.

RESULTS

A total of 1,479 radiology reports of patients diagnosed with brain metastases were retrieved. The least absolute shrinkage and selection operator (LASSO) regression model demonstrated the best overall performance on the hold-out test set with an area under the receiver operating characteristic curve of 0.92 (95% CI, 0.89 to 0.94), accuracy of 83% (95% CI, 80% to 87%), calibration intercept of -0.06 (95% CI, -0.14 to 0.01), and calibration slope of 1.06 (95% CI, 0.95 to 1.17).

CONCLUSION

Among various NLP techniques, the bag-of-words approach combined with a LASSO regression model demonstrated the best overall performance in extracting binary outcomes from free-text clinical reports. This study provides a framework for the development of machine learning-based NLP models as well as a clinical vignette of patients diagnosed with brain metastases.

摘要

目的

尽管患者生成的健康数据量正在呈指数级增长,但其应用却受到阻碍,因为大多数数据都是非结构化格式,即自由文本临床报告。各种自然语言处理(NLP)方法已经出现,用于自动化处理自由文本,从基于统计的模型到基于深度学习的模型;然而,医学文本分析的最佳方法仍有待确定。本研究的目的是对新型NLP技术进行直接比较,并为未来关于其在自动化医学文本分析中的效用的研究提供信息。

患者与方法

检索了在两个三级中心接受治疗的脑转移瘤患者的磁共振成像报告,并使用二元分类(单个转移瘤 两个或更多转移瘤)进行人工注释。在将注释后的报告以80:20的比例随机分为训练集和测试集后,开发并比较了多个词袋模型和基于序列的NLP模型。

结果

共检索到1479份诊断为脑转移瘤患者的放射学报告。最小绝对收缩和选择算子(LASSO)回归模型在保留测试集上表现出最佳的整体性能,受试者操作特征曲线下面积为0.92(95%CI,0.89至0.94),准确率为83%(95%CI,80%至87%),校准截距为-0.06(95%CI,-0.14至0.01),校准斜率为1.06(95%CI,0.95至1.17)。

结论

在各种NLP技术中,词袋方法与LASSO回归模型相结合在从自由文本临床报告中提取二元结果方面表现出最佳的整体性能。本研究为基于机器学习的NLP模型的开发提供了一个框架,以及诊断为脑转移瘤患者的临床案例。

相似文献

3
Natural language processing for automated quantification of bone metastases reported in free-text bone scintigraphy reports.
Acta Oncol. 2020 Dec;59(12):1455-1460. doi: 10.1080/0284186X.2020.1819563. Epub 2020 Sep 12.
6
Natural language processing of head CT reports to identify intracranial mass effect: CTIME algorithm.
Am J Emerg Med. 2022 Jan;51:388-392. doi: 10.1016/j.ajem.2021.11.001. Epub 2021 Nov 9.

引用本文的文献

1
Classifying Stereotactic Radiosurgery Patients by Primary Diagnosis Using Natural Language Processing of Clinical Notes.
JCO Clin Cancer Inform. 2025 Jun;9:e2400268. doi: 10.1200/CCI-24-00268. Epub 2025 Jun 13.
2
Natural Language Processing of Radiology Reports to Assess Survival in Patients with Advanced Melanoma.
Cancers (Basel). 2025 May 7;17(9):1595. doi: 10.3390/cancers17091595.
3
Using Generative AI to Extract Structured Information from Free Text Pathology Reports.
J Med Syst. 2025 Mar 13;49(1):36. doi: 10.1007/s10916-025-02167-2.
4
Text mining of verbal autopsy narratives to extract mortality causes and most prevalent diseases using natural language processing.
PLoS One. 2024 Sep 19;19(9):e0308452. doi: 10.1371/journal.pone.0308452. eCollection 2024.
5
Automatic Detection of Distant Metastasis Mentions in Radiology Reports in Spanish.
JCO Clin Cancer Inform. 2024 Jan;8:e2300130. doi: 10.1200/CCI.23.00130.
9
Developing a Cancer Digital Twin: Supervised Metastases Detection From Consecutive Structured Radiology Reports.
Front Artif Intell. 2022 Mar 2;5:826402. doi: 10.3389/frai.2022.826402. eCollection 2022.

本文引用的文献

3
Intelligent Word Embeddings of Free-Text Radiology Reports.
AMIA Annu Symp Proc. 2018 Apr 16;2017:411-420. eCollection 2017.
4
Automating the Determination of Prostate Cancer Risk Strata From Electronic Medical Records.
JCO Clin Cancer Inform. 2017;1. doi: 10.1200/CCI.16.00045. Epub 2017 Jun 8.
5
Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives.
PLoS One. 2018 Feb 15;13(2):e0192360. doi: 10.1371/journal.pone.0192360. eCollection 2018.
8
Applying deep neural networks to unstructured text notes in electronic medical records for phenotyping youth depression.
Evid Based Ment Health. 2017 Aug;20(3):83-87. doi: 10.1136/eb-2017-102688. Epub 2017 Jul 24.
10
-Gram-Based Text Compression.
Comput Intell Neurosci. 2016;2016:9483646. doi: 10.1155/2016/9483646. Epub 2016 Nov 14.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验