文献检索，用中文搜 PubMed

METHODS

We used EHR data of patients included in the Second Manifestations of ARTerial disease (SMART) study. We propose a deep learning-based multimodal architecture for our text mining pipeline that integrates neural text representation with preprocessed clinical predictors for the prediction of recurrence of major cardiovascular events in cardiovascular patients. Text preprocessing, including cleaning and stemming, was first applied to filter out the unwanted texts from X-ray radiology reports. Thereafter, text representation methods were used to numerically represent unstructured radiology reports with vectors. Subsequently, these text representation methods were added to prediction models to assess their clinical relevance. In this step, we applied logistic regression, support vector machine (SVM), multilayer perceptron neural network, convolutional neural network, long short-term memory (LSTM), and bidirectional LSTM deep neural network (BiLSTM).

RESULTS

We performed various experiments to evaluate the added value of the text in the prediction of major cardiovascular events. The two main scenarios were the integration of radiology reports (1) with classical clinical predictors and (2) with only age and sex in the case of unavailable clinical predictors. In total, data of 5603 patients were used with 5-fold cross-validation to train the models. In the first scenario, the multimodal BiLSTM (MI-BiLSTM) model achieved an area under the curve (AUC) of 84.7%, misclassification rate of 14.3%, and F1 score of 83.8%. In this scenario, the SVM model, trained on clinical variables and bag-of-words representation, achieved the lowest misclassification rate of 12.2%. In the case of unavailable clinical predictors, the MI-BiLSTM model trained on radiology reports and demographic (age and sex) variables reached an AUC, F1 score, and misclassification rate of 74.5%, 70.8%, and 20.4%, respectively.

CONCLUSIONS

Using the case study of routine care chest X-ray radiology reports, we demonstrated the clinical relevance of integrating text features and classical predictors in our text mining pipeline for cardiovascular risk prediction. The MI-BiLSTM model with word embedding representation appeared to have a desirable performance when trained on text data integrated with the clinical variables from the SMART study. Our results mined from chest X-ray reports showed that models using text data in addition to laboratory values outperform those using only known clinical predictors.

METHODS

RESULTS

CONCLUSIONS

方法

我们使用了包含在第二动脉粥样硬化表现研究（SMART）中的患者电子病历数据。我们提出了一种基于深度学习的多模态架构，用于我们的文本挖掘管道，该架构将神经文本表示与预处理的临床预测因子相结合，用于预测心血管患者主要心血管事件的复发。文本预处理，包括清理和词干化，首先应用于从 X 射线放射学报告中过滤掉不需要的文本。此后，使用文本表示方法用向量表示非结构化放射学报告。随后，将这些文本表示方法添加到预测模型中，以评估它们的临床相关性。在这一步中，我们应用了逻辑回归、支持向量机（SVM）、多层感知机神经网络、卷积神经网络、长短期记忆（LSTM）和双向长短期记忆神经网络（BiLSTM）。

结果

我们进行了各种实验来评估文本在预测主要心血管事件中的附加价值。两个主要场景是：（1）将放射学报告与经典临床预测因子结合，（2）在临床预测因子不可用时仅将年龄和性别与放射学报告结合。总共使用了 5603 名患者的数据，并进行了 5 折交叉验证来训练模型。在第一个场景中，多模态 BiLSTM（MI-BiLSTM）模型达到了 84.7%的曲线下面积（AUC）、14.3%的误分类率和 83.8%的 F1 分数。在这个场景中，基于临床变量和词袋表示的 SVM 模型达到了最低的误分类率 12.2%。在临床预测因子不可用时，基于放射学报告和人口统计学（年龄和性别）变量训练的 MI-BiLSTM 模型达到了 74.5%、70.8%和 20.4%的 AUC、F1 分数和误分类率。

结论

使用常规护理胸部 X 射线放射学报告的案例研究，我们证明了在心血管风险预测的文本挖掘管道中整合文本特征和经典预测因子的临床相关性。在将 SMART 研究的临床变量与文本数据相结合的情况下，基于词嵌入表示的 MI-BiLSTM 模型表现出了良好的性能。我们从胸部 X 射线报告中挖掘出的结果表明，使用文本数据和实验室值的模型优于仅使用已知临床预测因子的模型。