Zhou Fang, Gillespie Avrum, Gligorijevic Djordje, Gligorijevic Jelena, Obradovic Zoran
School of Data Science & Engineering, East China Normal University, Shanghai, China.
Division of Nephrology, Hypertension, and Kidney Transplantation, Department of Medicine, Lewis Katz School of Medicine, Temple University, Philadelphia, PA, United States.
J Biomed Inform. 2020 May;105:103409. doi: 10.1016/j.jbi.2020.103409. Epub 2020 Apr 15.
The accurate prediction of progression of Chronic Kidney Disease (CKD) to End Stage Renal Disease (ESRD) is of great importance to clinicians and a challenge to researchers as there are many causes and even more comorbidities that are ignored by the traditional prediction models. We examine whether utilizing a novel low-dimensional embedding model disease2disease (D2D) learned from a large-scale electronic health records (EHRs) could well clusters the causes of kidney diseases and comorbidities and further improve prediction of progression of CKD to ESRD compared to traditional risk factors. The study cohort consists of 2,507 hospitalized Stage 3 CKD patients of which 1,375 (54.8%) progressed to ESRD within 3 years. We evaluated the proposed unsupervised learning framework by applying a regularized logistic regression model and a cox proportional hazard model respectively, and compared the accuracies with the ones obtained by four alternative models. The results demonstrate that the learned low-dimensional disease representations from EHRs can capture the relationship between vast arrays of diseases, and can outperform traditional risk factors in a CKD progression prediction model. These results can be used both by clinicians in patient care and researchers to develop new prediction methods.
慢性肾脏病(CKD)进展至终末期肾病(ESRD)的准确预测对临床医生至关重要,同时也是研究人员面临的一项挑战,因为传统预测模型忽略了许多病因以及更多的合并症。我们研究了利用从大规模电子健康记录(EHRs)中学习到的新型低维嵌入模型疾病对疾病(D2D)是否能够很好地对肾脏疾病病因和合并症进行聚类,并且与传统风险因素相比,能否进一步改善CKD进展至ESRD的预测。研究队列包括2507名住院的3期CKD患者,其中1375名(54.8%)在3年内进展至ESRD。我们分别应用正则化逻辑回归模型和Cox比例风险模型对所提出的无监督学习框架进行评估,并将准确性与四个替代模型所得结果进行比较。结果表明,从EHRs中学习到的低维疾病表示能够捕捉大量疾病之间的关系,并且在CKD进展预测模型中优于传统风险因素。这些结果可供临床医生用于患者护理,也可供研究人员开发新的预测方法。