自动化自由文本放射学报告分类：使用不同的特征提取方法识别腓骨远端骨折。

Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula.

机构信息

Institute for Diagnostic and Interventional Radiology, Hannover Medical School, Hannover, Germany.

Centre for Information Management (ZIMt), Hannover Medical School, Hannover, Germany.

出版信息

Rofo. 2023 Aug;195(8):713-719. doi: 10.1055/a-2061-6562. Epub 2023 May 9.

DOI:10.1055/a-2061-6562

PMID:37160146

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10368466/

Abstract

PURPOSE

Radiology reports mostly contain free-text, which makes it challenging to obtain structured data. Natural language processing (NLP) techniques transform free-text reports into machine-readable document vectors that are important for creating reliable, scalable methods for data analysis. The aim of this study is to classify unstructured radiograph reports according to fractures of the distal fibula and to find the best text mining method.

MATERIALS & METHODS: We established a novel German language report dataset: a designated search engine was used to identify radiographs of the ankle and the reports were manually labeled according to fractures of the distal fibula. This data was used to establish a machine learning pipeline, which implemented the text representation methods bag-of-words (BOW), term frequency-inverse document frequency (TF-IDF), principal component analysis (PCA), non-negative matrix factorization (NMF), latent Dirichlet allocation (LDA), and document embedding (doc2vec). The extracted document vectors were used to train neural networks (NN), support vector machines (SVM), and logistic regression (LR) to recognize distal fibula fractures. The results were compared via cross-tabulations of the accuracy (acc) and area under the curve (AUC).

RESULTS

In total, 3268 radiograph reports were included, of which 1076 described a fracture of the distal fibula. Comparison of the text representation methods showed that BOW achieved the best results (AUC = 0.98; acc = 0.97), followed by TF-IDF (AUC = 0.97; acc = 0.96), NMF (AUC = 0.93; acc = 0.92), PCA (AUC = 0.92; acc = 0.9), LDA (AUC = 0.91; acc = 0.89) and doc2vec (AUC = 0.9; acc = 0.88). When comparing the different classifiers, NN (AUC = 0,91) proved to be superior to SVM (AUC = 0,87) and LR (AUC = 0,85).

CONCLUSION

An automated classification of unstructured reports of radiographs of the ankle can reliably detect findings of fractures of the distal fibula. A particularly suitable feature extraction method is the BOW model.

KEY POINTS

· The aim was to classify unstructured radiograph reports according to distal fibula fractures.. · Our automated classification system can reliably detect fractures of the distal fibula.. · A particularly suitable feature extraction method is the BOW model..

CITATION FORMAT

· Dewald CL, Balandis A, Becker LS et al. Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula. Fortschr Röntgenstr 2023; 195: 713 - 719.

摘要

目的

放射学报告主要包含自由文本，这使得获取结构化数据具有挑战性。自然语言处理（NLP）技术可将自由文本报告转换为机器可读的文档向量，这对于创建可靠、可扩展的数据分析方法非常重要。本研究的目的是根据腓骨远端骨折对非结构化的 X 光报告进行分类，并找到最佳的文本挖掘方法。

材料与方法

我们建立了一个新的德语报告数据集：使用指定的搜索引擎来识别踝关节的 X 光片，并根据腓骨远端骨折对手动标记的报告进行分类。该数据用于建立一个机器学习管道，该管道实现了词袋（BOW）、词频-逆文档频率（TF-IDF）、主成分分析（PCA）、非负矩阵分解（NMF）、潜在狄利克雷分配（LDA）和文档嵌入（doc2vec）等文本表示方法。提取的文档向量用于训练神经网络（NN）、支持向量机（SVM）和逻辑回归（LR），以识别腓骨远端骨折。通过准确性（acc）和曲线下面积（AUC）的交叉表比较结果。

结果

共纳入 3268 份 X 光报告，其中 1076 份报告描述了腓骨远端骨折。文本表示方法的比较显示，BOW 方法的结果最佳（AUC=0.98；acc=0.97），其次是 TF-IDF（AUC=0.97；acc=0.96）、NMF（AUC=0.93；acc=0.92）、PCA（AUC=0.92；acc=0.90）、LDA（AUC=0.91；acc=0.89）和 doc2vec（AUC=0.9；acc=0.88）。在比较不同的分类器时，NN（AUC=0.91）被证明优于 SVM（AUC=0.87）和 LR（AUC=0.85）。

结论

自动分类踝关节 X 光片的非结构化报告可以可靠地检测到腓骨远端骨折的发现。一种特别合适的特征提取方法是 BOW 模型。

关键点

· 目的是根据腓骨远端骨折对非结构化的 X 光报告进行分类。· 我们的自动化分类系统可以可靠地检测到腓骨远端骨折。· 一种特别合适的特征提取方法是 BOW 模型。

相似文献

Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula.自动化自由文本放射学报告分类：使用不同的特征提取方法识别腓骨远端骨折。

Rofo. 2023 Aug;195(8):713-719. doi: 10.1055/a-2061-6562. Epub 2023 May 9.

Natural Language-based Machine Learning Models for the Annotation of Clinical Radiology Reports.基于自然语言的机器学习模型在临床放射学报告标注中的应用。

Radiology. 2018 May;287(2):570-580. doi: 10.1148/radiol.2018171093. Epub 2018 Jan 30.

Integrating Natural Language Processing and Machine Learning Algorithms to Categorize Oncologic Response in Radiology Reports.将自然语言处理和机器学习算法集成到放射学报告中的肿瘤反应分类中。

J Digit Imaging. 2018 Apr;31(2):178-184. doi: 10.1007/s10278-017-0027-x.

Natural Language Processing of Radiology Reports in Patients With Hepatocellular Carcinoma to Predict Radiology Resource Utilization.肝细胞癌患者放射学报告的自然语言处理以预测放射学资源利用。

J Am Coll Radiol. 2019 Jun;16(6):840-844. doi: 10.1016/j.jacr.2018.12.004. Epub 2019 Mar 2.

Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.基于机器学习的自然语言处理方法对临床笔记进行医学子域分类。

BMC Med Inform Decis Mak. 2017 Dec 1;17(1):155. doi: 10.1186/s12911-017-0556-8.

Machine learning and natural language processing methods to identify ischemic stroke, acuity and location from radiology reports.基于机器学习和自然语言处理方法，从放射学报告中识别缺血性脑卒中、发病急缓和病变部位。

PLoS One. 2020 Jun 19;15(6):e0234908. doi: 10.1371/journal.pone.0234908. eCollection 2020.

Natural Language Processing for Automated Quantification of Brain Metastases Reported in Free-Text Radiology Reports.用于对自由文本放射学报告中报告的脑转移瘤进行自动定量的自然语言处理

JCO Clin Cancer Inform. 2019 Apr;3:1-9. doi: 10.1200/CCI.18.00138.

Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.基于机器学习模型集成与 BERT 语言模型的脑 CT 报告文本描述分析用于判断颅内出血的比较研究

Sovrem Tekhnologii Med. 2024;16(1):27-34. doi: 10.17691/stm2024.16.1.03. Epub 2024 Feb 28.

Prediction of Stroke Outcome Using Natural Language Processing-Based Machine Learning of Radiology Report of Brain MRI.使用基于自然语言处理的脑磁共振成像放射学报告机器学习预测卒中结局

J Pers Med. 2020 Dec 16;10(4):286. doi: 10.3390/jpm10040286.

Bag-of-Words Technique in Natural Language Processing: A Primer for Radiologists.词袋技术在自然语言处理中的应用：放射科医生入门指南。

Radiographics. 2021 Sep-Oct;41(5):1420-1426. doi: 10.1148/rg.2021210025. Epub 2021 Aug 13.

本文引用的文献

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

Scalable and accurate deep learning with electronic health records.借助电子健康记录实现可扩展且准确的深度学习。

NPJ Digit Med. 2018 May 8;1:18. doi: 10.1038/s41746-018-0029-1. eCollection 2018.

A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models.系统评价显示，机器学习在临床预测模型中并未优于逻辑回归。

J Clin Epidemiol. 2019 Jun;110:12-22. doi: 10.1016/j.jclinepi.2019.02.004. Epub 2019 Feb 11.

Quantitative diagnosis of breast tumors by morphometric classification of microenvironmental myoepithelial cells using a machine learning approach.基于机器学习的乳腺肿瘤微环境肌上皮细胞形态计量学分类的定量诊断。

Sci Rep. 2017 Apr 25;7:46732. doi: 10.1038/srep46732.

Complications after surgical management of distal lower leg fractures.小腿下段骨折手术治疗后的并发症

Scand J Trauma Resusc Emerg Med. 2016 Dec 9;24(1):146. doi: 10.1186/s13049-016-0333-1.

Identification of Long Bone Fractures in Radiology Reports Using Natural Language Processing to support Healthcare Quality Improvement.利用自然语言处理技术识别放射学报告中的长骨骨折以支持医疗质量改进

Appl Clin Inform. 2016 Nov 9;7(4):1051-1068. doi: 10.4338/ACI-2016-08-RA-0129.

Distal fibula fracture fixation: Biomechanical evaluation of three different fixation implants.腓骨远端骨折固定：三种不同固定植入物的生物力学评估

Foot Ankle Surg. 2016 Dec;22(4):278-285. doi: 10.1016/j.fas.2016.08.007. Epub 2016 Sep 9.

Natural Language Processing in Radiology: A Systematic Review.自然语言处理在放射学中的应用：系统评价。

Radiology. 2016 May;279(2):329-43. doi: 10.1148/radiol.16142770.

Information extraction from multi-institutional radiology reports.从多机构放射学报告中提取信息。

Artif Intell Med. 2016 Jan;66:29-39. doi: 10.1016/j.artmed.2015.09.007. Epub 2015 Oct 3.

Automatic Classification of Free-Text Radiology Reports to Identify Limb Fractures using Machine Learning and the SNOMED CT Ontology.利用机器学习和SNOMED CT本体对自由文本放射学报告进行自动分类以识别肢体骨折

AMIA Jt Summits Transl Sci Proc. 2013 Mar 18;2013:300-4. eCollection 2013.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

自动化自由文本放射学报告分类：使用不同的特征提取方法识别腓骨远端骨折。

Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula.

机构信息

出版信息

PURPOSE

RESULTS

CONCLUSION

KEY POINTS

CITATION FORMAT

目的

材料与方法

结果

结论

关键点

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献