Suppr超能文献

自动化自由文本放射学报告分类:使用不同的特征提取方法识别腓骨远端骨折。

Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula.

机构信息

Institute for Diagnostic and Interventional Radiology, Hannover Medical School, Hannover, Germany.

Centre for Information Management (ZIMt), Hannover Medical School, Hannover, Germany.

出版信息

Rofo. 2023 Aug;195(8):713-719. doi: 10.1055/a-2061-6562. Epub 2023 May 9.

Abstract

PURPOSE

Radiology reports mostly contain free-text, which makes it challenging to obtain structured data. Natural language processing (NLP) techniques transform free-text reports into machine-readable document vectors that are important for creating reliable, scalable methods for data analysis. The aim of this study is to classify unstructured radiograph reports according to fractures of the distal fibula and to find the best text mining method.

MATERIALS & METHODS: We established a novel German language report dataset: a designated search engine was used to identify radiographs of the ankle and the reports were manually labeled according to fractures of the distal fibula. This data was used to establish a machine learning pipeline, which implemented the text representation methods bag-of-words (BOW), term frequency-inverse document frequency (TF-IDF), principal component analysis (PCA), non-negative matrix factorization (NMF), latent Dirichlet allocation (LDA), and document embedding (doc2vec). The extracted document vectors were used to train neural networks (NN), support vector machines (SVM), and logistic regression (LR) to recognize distal fibula fractures. The results were compared via cross-tabulations of the accuracy (acc) and area under the curve (AUC).

RESULTS

In total, 3268 radiograph reports were included, of which 1076 described a fracture of the distal fibula. Comparison of the text representation methods showed that BOW achieved the best results (AUC = 0.98; acc = 0.97), followed by TF-IDF (AUC = 0.97; acc = 0.96), NMF (AUC = 0.93; acc = 0.92), PCA (AUC = 0.92; acc = 0.9), LDA (AUC = 0.91; acc = 0.89) and doc2vec (AUC = 0.9; acc = 0.88). When comparing the different classifiers, NN (AUC = 0,91) proved to be superior to SVM (AUC = 0,87) and LR (AUC = 0,85).

CONCLUSION

An automated classification of unstructured reports of radiographs of the ankle can reliably detect findings of fractures of the distal fibula. A particularly suitable feature extraction method is the BOW model.

KEY POINTS

· The aim was to classify unstructured radiograph reports according to distal fibula fractures.. · Our automated classification system can reliably detect fractures of the distal fibula.. · A particularly suitable feature extraction method is the BOW model..

CITATION FORMAT

· Dewald CL, Balandis A, Becker LS et al. Automated Classification of Free-Text Radiology Reports: Using Different Feature Extraction Methods to Identify Fractures of the Distal Fibula. Fortschr Röntgenstr 2023; 195: 713 - 719.

摘要

目的

放射学报告主要包含自由文本,这使得获取结构化数据具有挑战性。自然语言处理(NLP)技术可将自由文本报告转换为机器可读的文档向量,这对于创建可靠、可扩展的数据分析方法非常重要。本研究的目的是根据腓骨远端骨折对非结构化的 X 光报告进行分类,并找到最佳的文本挖掘方法。

材料与方法

我们建立了一个新的德语报告数据集:使用指定的搜索引擎来识别踝关节的 X 光片,并根据腓骨远端骨折对手动标记的报告进行分类。该数据用于建立一个机器学习管道,该管道实现了词袋(BOW)、词频-逆文档频率(TF-IDF)、主成分分析(PCA)、非负矩阵分解(NMF)、潜在狄利克雷分配(LDA)和文档嵌入(doc2vec)等文本表示方法。提取的文档向量用于训练神经网络(NN)、支持向量机(SVM)和逻辑回归(LR),以识别腓骨远端骨折。通过准确性(acc)和曲线下面积(AUC)的交叉表比较结果。

结果

共纳入 3268 份 X 光报告,其中 1076 份报告描述了腓骨远端骨折。文本表示方法的比较显示,BOW 方法的结果最佳(AUC=0.98;acc=0.97),其次是 TF-IDF(AUC=0.97;acc=0.96)、NMF(AUC=0.93;acc=0.92)、PCA(AUC=0.92;acc=0.90)、LDA(AUC=0.91;acc=0.89)和 doc2vec(AUC=0.9;acc=0.88)。在比较不同的分类器时,NN(AUC=0.91)被证明优于 SVM(AUC=0.87)和 LR(AUC=0.85)。

结论

自动分类踝关节 X 光片的非结构化报告可以可靠地检测到腓骨远端骨折的发现。一种特别合适的特征提取方法是 BOW 模型。

关键点

· 目的是根据腓骨远端骨折对非结构化的 X 光报告进行分类。· 我们的自动化分类系统可以可靠地检测到腓骨远端骨折。· 一种特别合适的特征提取方法是 BOW 模型。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验