Suppr超能文献

自动化自由文本放射学报告分类:使用不同的特征提取方法识别腓骨远端骨折。

Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula.

机构信息

Institute for Diagnostic and Interventional Radiology, Hannover Medical School, Hannover, Germany.

Centre for Information Management (ZIMt), Hannover Medical School, Hannover, Germany.

出版信息

Rofo. 2023 Aug;195(8):713-719. doi: 10.1055/a-2061-6562. Epub 2023 May 9.

Abstract

PURPOSE

Radiology reports mostly contain free-text, which makes it challenging to obtain structured data. Natural language processing (NLP) techniques transform free-text reports into machine-readable document vectors that are important for creating reliable, scalable methods for data analysis. The aim of this study is to classify unstructured radiograph reports according to fractures of the distal fibula and to find the best text mining method.

MATERIALS & METHODS: We established a novel German language report dataset: a designated search engine was used to identify radiographs of the ankle and the reports were manually labeled according to fractures of the distal fibula. This data was used to establish a machine learning pipeline, which implemented the text representation methods bag-of-words (BOW), term frequency-inverse document frequency (TF-IDF), principal component analysis (PCA), non-negative matrix factorization (NMF), latent Dirichlet allocation (LDA), and document embedding (doc2vec). The extracted document vectors were used to train neural networks (NN), support vector machines (SVM), and logistic regression (LR) to recognize distal fibula fractures. The results were compared via cross-tabulations of the accuracy (acc) and area under the curve (AUC).

RESULTS

In total, 3268 radiograph reports were included, of which 1076 described a fracture of the distal fibula. Comparison of the text representation methods showed that BOW achieved the best results (AUC = 0.98; acc = 0.97), followed by TF-IDF (AUC = 0.97; acc = 0.96), NMF (AUC = 0.93; acc = 0.92), PCA (AUC = 0.92; acc = 0.9), LDA (AUC = 0.91; acc = 0.89) and doc2vec (AUC = 0.9; acc = 0.88). When comparing the different classifiers, NN (AUC = 0,91) proved to be superior to SVM (AUC = 0,87) and LR (AUC = 0,85).

CONCLUSION

An automated classification of unstructured reports of radiographs of the ankle can reliably detect findings of fractures of the distal fibula. A particularly suitable feature extraction method is the BOW model.

KEY POINTS

· The aim was to classify unstructured radiograph reports according to distal fibula fractures.. · Our automated classification system can reliably detect fractures of the distal fibula.. · A particularly suitable feature extraction method is the BOW model..

CITATION FORMAT

· Dewald CL, Balandis A, Becker LS et al. Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula. Fortschr Röntgenstr 2023; 195: 713 - 719.

摘要

目的

放射学报告主要包含自由文本,这使得获取结构化数据具有挑战性。自然语言处理(NLP)技术可将自由文本报告转换为机器可读的文档向量,这对于创建可靠、可扩展的数据分析方法非常重要。本研究的目的是根据腓骨远端骨折对非结构化的 X 光报告进行分类,并找到最佳的文本挖掘方法。

材料与方法

我们建立了一个新的德语报告数据集:使用指定的搜索引擎来识别踝关节的 X 光片,并根据腓骨远端骨折对手动标记的报告进行分类。该数据用于建立一个机器学习管道,该管道实现了词袋(BOW)、词频-逆文档频率(TF-IDF)、主成分分析(PCA)、非负矩阵分解(NMF)、潜在狄利克雷分配(LDA)和文档嵌入(doc2vec)等文本表示方法。提取的文档向量用于训练神经网络(NN)、支持向量机(SVM)和逻辑回归(LR),以识别腓骨远端骨折。通过准确性(acc)和曲线下面积(AUC)的交叉表比较结果。

结果

共纳入 3268 份 X 光报告,其中 1076 份报告描述了腓骨远端骨折。文本表示方法的比较显示,BOW 方法的结果最佳(AUC=0.98;acc=0.97),其次是 TF-IDF(AUC=0.97;acc=0.96)、NMF(AUC=0.93;acc=0.92)、PCA(AUC=0.92;acc=0.90)、LDA(AUC=0.91;acc=0.89)和 doc2vec(AUC=0.9;acc=0.88)。在比较不同的分类器时,NN(AUC=0.91)被证明优于 SVM(AUC=0.87)和 LR(AUC=0.85)。

结论

自动分类踝关节 X 光片的非结构化报告可以可靠地检测到腓骨远端骨折的发现。一种特别合适的特征提取方法是 BOW 模型。

关键点

· 目的是根据腓骨远端骨折对非结构化的 X 光报告进行分类。· 我们的自动化分类系统可以可靠地检测到腓骨远端骨折。· 一种特别合适的特征提取方法是 BOW 模型。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验