Medical Science, Kawasaki Medical School, Kurashiki, Okayama, Japan.
College of Engineering, University of Michigan, Ann Arbor, Michigan, United States of America.
PLoS One. 2020 May 29;15(5):e0233491. doi: 10.1371/journal.pone.0233491. eCollection 2020.
Although dialysis patients are at a high risk of death, it is difficult for medical practitioners to simultaneously evaluate many inter-related risk factors. In this study, we evaluated the characteristics of hemodialysis patients using machine learning model, and its usefulness for screening hemodialysis patients at a high risk of one-year death using the nation-wide database of the Japanese Society for Dialysis Therapy.
The patients were separated into two datasets (n = 39,930, 39,930, respectively). We categorized hemodialysis patients in Japan into new clusters generated by the K-means clustering method using the development dataset. The association between a cluster and the risk of death was evaluated using multivariate Cox proportional hazards models. Then, we developed an ensemble model composed of the clusters and support vector machine models in the model development phase, and compared the accuracy of the prediction of mortality between the machine learning models in the model validation phase.
Average age of the subjects was 65.7±12.2 years; 32.7% had diabetes mellitus. The five clusters clearly distinguished the groups on the basis of their characteristics: Cluster 1, young male, and chronic glomerulonephritis; Cluster 2, female, and chronic glomerulonephritis; Cluster 3, diabetes mellitus; Cluster 4, elderly and nephrosclerosis; Cluster 5, elderly and protein energy wasting. These clusters were associated with the risk of death; Cluster 5 compared with Cluster 1, hazard ratio 8.86 (95% CI 7.68, 10.21). The accuracy of the ensemble model for the prediction of 1-year death was 0.948 and higher than those of logistic regression model (0.938), support vector machine model (0.937), and deep learning model (0.936).
The clusters clearly categorized patient on their characteristics, and reflected their prognosis. Our real-world-data-based machine learning system is applicable to identifying high-risk hemodialysis patients in clinical settings, and has a strong potential to guide treatments and improve their prognosis.
尽管透析患者的死亡风险较高,但临床医生很难同时评估许多相互关联的风险因素。在这项研究中,我们使用机器学习模型评估了血液透析患者的特征,并利用日本透析治疗学会的全国性数据库评估了该模型筛选血液透析患者一年死亡风险的效用。
将患者分为两个数据集(n = 39930,n = 39930)。我们使用开发数据集的 K 均值聚类方法将日本血液透析患者分为新的聚类。使用多变量 Cox 比例风险模型评估聚类与死亡风险之间的关联。然后,在模型开发阶段,我们构建了一个由聚类和支持向量机模型组成的集成模型,并在模型验证阶段比较了这些机器学习模型预测死亡率的准确性。
研究对象的平均年龄为 65.7±12.2 岁,32.7%患有糖尿病。这 5 个聚类可以根据其特征清楚地区分不同的组别:聚类 1,年轻男性,慢性肾小球肾炎;聚类 2,女性,慢性肾小球肾炎;聚类 3,糖尿病;聚类 4,老年,肾动脉硬化;聚类 5,老年,蛋白质能量消耗不良。这些聚类与死亡风险相关;与聚类 1 相比,聚类 5 的危险比为 8.86(95%可信区间为 7.68,10.21)。该集成模型预测 1 年死亡率的准确性为 0.948,高于逻辑回归模型(0.938)、支持向量机模型(0.937)和深度学习模型(0.936)。
这些聚类可以清楚地对患者的特征进行分类,并反映其预后。我们基于真实世界数据的机器学习系统适用于识别临床环境中的高风险血液透析患者,具有指导治疗和改善预后的强大潜力。