使用自然语言处理算法实现膝关节半月板撕裂的放射-关节镜自动关联。

Automated Radiology-Arthroscopy Correlation of Knee Meniscal Tears Using Natural Language Processing Algorithms.

机构信息

Department of Radiology, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114.

出版信息

Acad Radiol. 2022 Apr;29(4):479-487. doi: 10.1016/j.acra.2021.01.017. Epub 2021 Feb 11.

DOI:10.1016/j.acra.2021.01.017

PMID:33583713

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8355247/

Abstract

RATIONALE AND OBJECTIVES

Train and apply natural language processing (NLP) algorithms for automated radiology-arthroscopy correlation of meniscal tears.

MATERIALS AND METHODS

In this retrospective single-institution study, we trained supervised machine learning models (logistic regression, support vector machine, and random forest) to detect medial or lateral meniscus tears on free-text MRI reports. We trained and evaluated model performances with cross-validation using 3593 manually annotated knee MRI reports. To assess radiology-arthroscopy correlation, we then randomly partitioned this dataset 80:20 for training and testing, where 108 test set MRIs were followed by knee arthroscopy within 1 year. These free-text arthroscopy reports were also manually annotated. The NLP algorithms trained on the knee MRI training dataset were then evaluated on the MRI and arthroscopy report test datasets. We assessed radiology-arthroscopy agreement using the ensembled NLP-extracted findings versus manually annotated findings.

RESULTS

The NLP models showed high cross-validation performance for meniscal tear detection on knee MRI reports (medial meniscus F1 scores 0.93-0.94, lateral meniscus F1 scores 0.86-0.88). When these algorithms were evaluated on arthroscopy reports, despite never training on arthroscopy reports, performance was similar, though higher with model ensembling (medial meniscus F1 score 0.97, lateral meniscus F1 score 0.99). However, ensembling did not improve performance on knee MRI reports. In the radiology-arthroscopy test set, the ensembled NLP models were able to detect mismatches between MRI and arthroscopy reports with sensitivity 79% and specificity 87%.

CONCLUSION

Radiology-arthroscopy correlation can be automated for knee meniscal tears using NLP algorithms, which shows promise for education and quality improvement.

摘要

背景与目的

训练并应用自然语言处理（NLP）算法，实现半月板撕裂的放射-关节镜自动关联。

材料与方法

本回顾性单中心研究采用监督机器学习模型（逻辑回归、支持向量机和随机森林），从膝关节 MRI 报告的自由文本中检测内侧或外侧半月板撕裂。我们使用 3593 份经手工标注的膝关节 MRI 报告进行交叉验证，训练并评估模型性能。为了评估放射-关节镜相关性，我们将该数据集随机分为 80:20 用于训练和测试，其中 108 份测试集 MRI 在 1 年内进行了膝关节关节镜检查。这些自由文本关节镜报告也进行了人工标注。然后，我们将在膝关节 MRI 训练数据集中训练的 NLP 算法应用于 MRI 和关节镜报告测试数据集。我们使用综合 NLP 提取的发现与手动标注的发现来评估放射学-关节镜的一致性。

结果

NLP 模型在膝关节 MRI 报告上检测半月板撕裂的表现具有较高的交叉验证性能（内侧半月板 F1 分数为 0.93-0.94，外侧半月板 F1 分数为 0.86-0.88）。当这些算法应用于关节镜报告时，尽管从未在关节镜报告上进行过训练，但性能相似，模型综合后性能更高（内侧半月板 F1 分数为 0.97，外侧半月板 F1 分数为 0.99）。然而，综合并没有提高膝关节 MRI 报告的性能。在放射学-关节镜测试集中，综合后的 NLP 模型能够以 79%的敏感性和 87%的特异性检测 MRI 和关节镜报告之间的不匹配。