Hassanpour Saeed, Langlotz Curtis P, Amrhein Timothy J, Befera Nicholas T, Lungren Matthew P
1 Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA.
2 Department of Radiology, Stanford University School of Medicine, Stanford University Medical Center, 725 Welch Rd, Rm 1675, Stanford, CA 94305-5913.
AJR Am J Roentgenol. 2017 Apr;208(4):750-753. doi: 10.2214/AJR.16.16128. Epub 2017 Jan 31.
The purpose of this study is to evaluate the performance of a natural language processing (NLP) system in classifying a database of free-text knee MRI reports at two separate academic radiology practices.
An NLP system that uses terms and patterns in manually classified narrative knee MRI reports was constructed. The NLP system was trained and tested on expert-classified knee MRI reports from two major health care organizations. Radiology reports were modeled in the training set as vectors, and a support vector machine framework was used to train the classifier. A separate test set from each organization was used to evaluate the performance of the system. We evaluated the performance of the system both within and across organizations. Standard evaluation metrics, such as accuracy, precision, recall, and F1 score (i.e., the weighted average of the precision and recall), and their respective 95% CIs were used to measure the efficacy of our classification system.
The accuracy for radiology reports that belonged to the model's clinically significant concept classes after training data from the same institution was good, yielding an F1 score greater than 90% (95% CI, 84.6-97.3%). Performance of the classifier on cross-institutional application without institution-specific training data yielded F1 scores of 77.6% (95% CI, 69.5-85.7%) and 90.2% (95% CI, 84.5-95.9%) at the two organizations studied.
The results show excellent accuracy by the NLP machine learning classifier in classifying free-text knee MRI reports, supporting the institution-independent reproducibility of knee MRI report classification. Furthermore, the machine learning classifier performed well on free-text knee MRI reports from another institution. These data support the feasibility of multiinstitutional classification of radiologic imaging text reports with a single machine learning classifier without requiring institution-specific training data.
本研究旨在评估一种自然语言处理(NLP)系统在两个独立的学术放射科实践中对自由文本膝关节MRI报告数据库进行分类的性能。
构建了一个使用手动分类的叙述性膝关节MRI报告中的术语和模式的NLP系统。该NLP系统在来自两个主要医疗保健组织的专家分类膝关节MRI报告上进行训练和测试。放射学报告在训练集中被建模为向量,并使用支持向量机框架训练分类器。来自每个组织的单独测试集用于评估系统的性能。我们评估了系统在组织内部和组织之间的性能。使用标准评估指标,如准确性、精确性、召回率和F1分数(即精确性和召回率的加权平均值)及其各自的95%置信区间来衡量我们分类系统的有效性。
在使用来自同一机构的训练数据后,属于模型临床重要概念类别的放射学报告的准确性良好,F1分数大于90%(95%置信区间,84.6 - 97.3%)。在没有特定机构训练数据的跨机构应用中,分类器在两个研究组织中的F1分数分别为77.6%(95%置信区间,69.5 - 85.7%)和90.2%(95%置信区间,84.5 - 95.9%)。
结果表明NLP机器学习分类器在对自由文本膝关节MRI报告进行分类时具有出色的准确性,支持膝关节MRI报告分类的机构独立可重复性。此外,机器学习分类器在来自另一个机构的自由文本膝关节MRI报告上表现良好。这些数据支持使用单个机器学习分类器对放射影像文本报告进行多机构分类的可行性,而无需特定机构的训练数据。