Center for Information and Systems Engineering, Boston University, Boston, MA, USA.
Stat Methods Med Res. 2019 Dec;28(12):3667-3682. doi: 10.1177/0962280218810911. Epub 2018 Nov 25.
To derive a predictive model to identify patients likely to be hospitalized during the following year due to complications attributed to Type II diabetes. A variety of supervised machine learning classification methods were tested and a new method that discovers hidden patient clusters in the positive class (hospitalized) was developed while, at the same time, sparse linear support vector machine classifiers were derived to separate positive samples from the negative ones (non-hospitalized). The convergence of the new method was established and theoretical guarantees were proved on how the classifiers it produces generalize to a test set not seen during training. The methods were tested on a large set of patients from the Boston Medical Center - the largest safety net hospital in New England. It is found that our new joint clustering/classification method achieves an accuracy of 89% (measured in terms of area under the ROC Curve) and yields informative clusters which can help interpret the classification results, thus increasing the trust of physicians to the algorithmic output and providing some guidance towards preventive measures. While it is possible to increase accuracy to 92% with other methods, this comes with increased computational cost and lack of interpretability. The analysis shows that even a modest probability of preventive actions being effective (more than 19%) suffices to generate significant hospital care savings. Predictive models are proposed that can help avert hospitalizations, improve health outcomes and drastically reduce hospital expenditures. The scope for savings is significant as it has been estimated that in the USA alone, about $5.8 billion are spent each year on diabetes-related hospitalizations that could be prevented.
开发一种预测模型,以识别在未来一年内可能因 II 型糖尿病并发症而住院的患者。测试了各种监督机器学习分类方法,并开发了一种新方法,该方法可以在阳性(住院)类中发现隐藏的患者簇,同时,还衍生出稀疏线性支持向量机分类器,以将阳性样本与阴性样本(未住院)分开。证明了新方法的收敛性,并证明了如何将其产生的分类器推广到训练期间未看到的测试集上的理论保证。在新英格兰最大的安全网医院波士顿医疗中心的一大批患者中测试了这些方法。结果发现,我们的新联合聚类/分类方法的准确率达到 89%(以 ROC 曲线下的面积衡量),并产生了有助于解释分类结果的信息丰富的簇,从而增加了医生对算法输出的信任,并为预防措施提供了一些指导。虽然使用其他方法可以将准确率提高到 92%,但这会增加计算成本和缺乏可解释性。分析表明,即使预防措施有效的概率适中(超过 19%),也足以节省大量的医院护理费用。提出了可以帮助避免住院、改善健康结果并大幅降低医院支出的预测模型。节省的范围很大,因为据估计,仅在美国,每年就有大约 58 亿美元用于可以预防的糖尿病相关住院治疗。