Suppr超能文献

将自然语言处理和机器学习算法集成到放射学报告中的肿瘤反应分类中。

Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports.

机构信息

Department of Radiology, Perelman School of Medicine, Hospital of the University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, USA.

Musculoskeletal Imaging Division, Department of Radiology, Hospital of the University of Pennsylvania, 3400 Spruce St., 1 Silverstein, Philadelphia, PA, 19104, USA.

出版信息

J Digit Imaging. 2018 Apr;31(2):178-184. doi: 10.1007/s10278-017-0027-x.

Abstract

A significant volume of medical data remains unstructured. Natural language processing (NLP) and machine learning (ML) techniques have shown to successfully extract insights from radiology reports. However, the codependent effects of NLP and ML in this context have not been well-studied. Between April 1, 2015 and November 1, 2016, 9418 cross-sectional abdomen/pelvis CT and MR examinations containing our internal structured reporting element for cancer were separated into four categories: Progression, Stable Disease, Improvement, or No Cancer. We combined each of three NLP techniques with five ML algorithms to predict the assigned label using the unstructured report text and compared the performance of each combination. The three NLP algorithms included term frequency-inverse document frequency (TF-IDF), term frequency weighting (TF), and 16-bit feature hashing. The ML algorithms included logistic regression (LR), random decision forest (RDF), one-vs-all support vector machine (SVM), one-vs-all Bayes point machine (BPM), and fully connected neural network (NN). The best-performing NLP model consisted of tokenized unigrams and bigrams with TF-IDF. Increasing N-gram length yielded little to no added benefit for most ML algorithms. With all parameters optimized, SVM had the best performance on the test dataset, with 90.6 average accuracy and F score of 0.813. The interplay between ML and NLP algorithms and their effect on interpretation accuracy is complex. The best accuracy is achieved when both algorithms are optimized concurrently.

摘要

大量的医学数据仍然是非结构化的。自然语言处理(NLP)和机器学习(ML)技术已被证明可以成功地从放射学报告中提取见解。然而,在这种情况下,NLP 和 ML 的相互依存效应尚未得到很好的研究。在 2015 年 4 月 1 日至 2016 年 11 月 1 日期间,9418 项横断面腹部/骨盆 CT 和 MR 检查包含我们内部用于癌症的结构化报告元素,分为四类:进展、稳定疾病、改善或无癌症。我们将三种 NLP 技术中的每一种与五种 ML 算法相结合,使用非结构化报告文本预测分配的标签,并比较每种组合的性能。三种 NLP 算法包括词频-逆文档频率(TF-IDF)、词频加权(TF)和 16 位特征哈希。ML 算法包括逻辑回归(LR)、随机决策森林(RDF)、一对一支持向量机(SVM)、一对一贝叶斯点机(BPM)和全连接神经网络(NN)。表现最好的 NLP 模型由带有 TF-IDF 的标记化单字和双字组成。对于大多数 ML 算法来说,增加 N 元长度几乎没有带来额外的好处。在所有参数都得到优化的情况下,SVM 在测试数据集上的性能最好,平均准确率为 90.6%,F 得分为 0.813。ML 和 NLP 算法之间的相互作用及其对解释准确性的影响是复杂的。当两种算法都被同时优化时,准确性达到最佳。

相似文献

引用本文的文献

3
[Transformation of free-text radiology reports into structured data].[将自由文本形式的放射学报告转换为结构化数据]
Radiologie (Heidelb). 2025 Apr;65(4):249-256. doi: 10.1007/s00117-025-01422-4. Epub 2025 Feb 11.
6
Artificial Intelligence to Improve Patient Understanding of Radiology Reports.人工智能提高患者对放射科报告的理解。
Yale J Biol Med. 2023 Sep 29;96(3):407-417. doi: 10.59249/NKOY5498. eCollection 2023 Sep.
7
Optimization of U-shaped pure transformer medical image segmentation network.U型纯变压器医学图像分割网络的优化
PeerJ Comput Sci. 2023 Aug 18;9:e1515. doi: 10.7717/peerj-cs.1515. eCollection 2023.

本文引用的文献

5
Natural Language Processing in Oncology: A Review.自然语言处理在肿瘤学中的应用:综述
JAMA Oncol. 2016 Jun 1;2(6):797-804. doi: 10.1001/jamaoncol.2016.0213.
6
8
Information extraction from multi-institutional radiology reports.从多机构放射学报告中提取信息。
Artif Intell Med. 2016 Jan;66:29-39. doi: 10.1016/j.artmed.2015.09.007. Epub 2015 Oct 3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验