Department of Radiology and Biomedical Imaging, Yale School of Medicine, New Haven, CT, USA.
Department of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.
Eur Radiol. 2020 Jul;30(7):4125-4133. doi: 10.1007/s00330-020-06721-z. Epub 2020 Feb 26.
The highly structured nature of medical reports makes them feasible for automated large-scale patient identification. This study aimed to develop a natural language processing (NLP) model to retrospectively retrieve patients with presence and history of carotid stenosis (CS) using their ultrasound reports.
Ultrasound reports from our institution between January 2016 and December 2017 were selected. To process the texts, we developed a parser to divide the raw text into fields. For baseline method, we used bag-of-n-grams and term frequency inverse document frequency as the features and used linear classifiers. Logistic regression was performed as the baseline model. Convolution and recurrent neural networks (CNN; RNN) with attention mechanism were applied to the dataset to improve the classification accuracy.
We had 1220 ultrasound reports for training and 307 for testing, totaling to 1527 reports. For predicting history of CS, both CNN and RNN-attention models had a significantly higher specificity than logistic regression. In addition, RNN-attention also had a significantly higher F1 score and accuracy. For predicting presence of carotid stenosis, all models achieved above 93% accuracy. RNN-attention achieved a 95.4% accuracy, although the difference with logistic regression was not statistically significant. RNN-attention had a statistically significant higher specificity than logistic regression.
We developed linear, CNN, and RNN models to predict history and presence of CS from ultrasound reports. We have demonstrated NLP to be an efficient, accurate approach for large-scale retrospective patient identification, with applications in long-term follow-up of patients and clinical research studies.
• Natural language processing models using both linear classifiers and neural networks can achieve a good performance, with an overall accuracy above 90% in predicting history and presence of carotid stenosis. • Convolution and recurrent neural networks, especially with additional features including field awareness and attention mechanism, have superior performance than traditional linear classifiers. • NLP is shown to be an efficient approach for large-scale retrospective patient identification, with applications in long-term follow-up of patients and further clinical research studies.
医学报告具有高度结构化的特点,因此非常适合进行自动化的大规模患者识别。本研究旨在开发一种自然语言处理(NLP)模型,以便使用其超声报告回顾性检索存在和既往颈动脉狭窄(CS)的患者。
选择了我院 2016 年 1 月至 2017 年 12 月期间的超声报告。为了处理文本,我们开发了一个解析器,将原始文本分为字段。对于基线方法,我们使用了词袋和词频逆文档频率作为特征,并使用了线性分类器。逻辑回归被用作基线模型。卷积和循环神经网络(CNN;RNN)与注意力机制被应用于数据集,以提高分类准确性。
我们有 1220 份超声报告用于训练和 307 份用于测试,总计 1527 份报告。在预测 CS 病史方面,CNN 和 RNN-attention 模型的特异性均显著高于逻辑回归。此外,RNN-attention 还具有更高的 F1 评分和准确性。在预测颈动脉狭窄的存在方面,所有模型的准确率均超过 93%。RNN-attention 的准确率达到了 95.4%,尽管与逻辑回归的差异没有统计学意义。RNN-attention 的特异性显著高于逻辑回归。
我们开发了线性、CNN 和 RNN 模型,以便从超声报告中预测 CS 的病史和存在。我们已经证明了 NLP 是一种高效、准确的大规模回顾性患者识别方法,可应用于患者的长期随访和临床研究。
使用线性分类器和神经网络的自然语言处理模型可以取得良好的性能,在预测颈动脉狭窄的病史和存在方面,整体准确率超过 90%。
卷积和循环神经网络,特别是具有字段感知和注意力机制等附加特征的模型,其性能优于传统的线性分类器。
NLP 是一种高效的大规模回顾性患者识别方法,可应用于患者的长期随访和进一步的临床研究。