Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, Pennsylvania, USA.
J Am Med Inform Assoc. 2020 Jul 1;27(9):1343-1351. doi: 10.1093/jamia/ocaa120.
We sought to predict if patients with type 2 diabetes mellitus (DM2) would develop 10 selected complications. Accurate prediction of complications could help with more targeted measures that would prevent or slow down their development.
Experiments were conducted on the Healthcare Cost and Utilization Project State Inpatient Databases of California for the period of 2003 to 2011. Recurrent neural network (RNN) long short-term memory (LSTM) and RNN gated recurrent unit (GRU) deep learning methods were designed and compared with random forest and multilayer perceptron traditional models. Prediction accuracy of selected complications were compared on 3 settings corresponding to minimum number of hospitalizations between diabetes diagnosis and the diagnosis of complications.
The diagnosis domain was used for experiments. The best results were achieved with RNN GRU model, followed by RNN LSTM model. The prediction accuracy achieved with RNN GRU model was between 73% (myocardial infarction) and 83% (chronic ischemic heart disease), while accuracy of traditional models was between 66% - 76%.
The number of hospitalizations was an important factor for the prediction accuracy. Experiments with 4 hospitalizations achieved significantly better accuracy than with 2 hospitalizations. To achieve improved accuracy deep learning models required training on at least 1000 patients and accuracy significantly dropped if training datasets contained 500 patients. The prediction accuracy of complications decreases over time period. Considering individual complications, the best accuracy was achieved on depressive disorder and chronic ischemic heart disease.
The RNN GRU model was the best choice for electronic medical record type of data, based on the achieved results.
我们试图预测 2 型糖尿病(DM2)患者是否会出现 10 种选定的并发症。准确预测并发症有助于采取更有针对性的措施,以预防或减缓其发展。
实验在 2003 年至 2011 年期间对加利福尼亚州医疗保健成本和利用项目州住院患者数据库进行。设计了递归神经网络(RNN)长短期记忆(LSTM)和 RNN 门控循环单元(GRU)深度学习方法,并与随机森林和多层感知器传统模型进行了比较。根据糖尿病诊断和并发症诊断之间的最小住院次数,在 3 种设置下比较了选定并发症的预测准确性。
在诊断领域进行了实验。RNN GRU 模型的结果最佳,其次是 RNN LSTM 模型。RNN GRU 模型的预测准确率在 73%(心肌梗死)和 83%(慢性缺血性心脏病)之间,而传统模型的准确率在 66%-76%之间。
住院次数是预测准确性的一个重要因素。进行 4 次住院的实验比进行 2 次住院的实验具有显著更高的准确性。为了提高准确性,深度学习模型需要在至少 1000 名患者上进行训练,如果训练数据集包含 500 名患者,则准确性会显著下降。并发症的预测准确性会随时间推移而降低。考虑到个别并发症,在抑郁障碍和慢性缺血性心脏病方面取得了最佳准确性。
基于所取得的结果,RNN GRU 模型是电子病历类型数据的最佳选择。