Department of Biomedical Informatics, University of Utah, Suite 140, 421 Wakara Way, Salt Lake City, UT 84108 USA.
Health Inf Sci Syst. 2016 Mar 8;4:2. doi: 10.1186/s13755-016-0015-4. eCollection 2016.
Predictive modeling is a key component of solutions to many healthcare problems. Among all predictive modeling approaches, machine learning methods often achieve the highest prediction accuracy, but suffer from a long-standing open problem precluding their widespread use in healthcare. Most machine learning models give no explanation for their prediction results, whereas interpretability is essential for a predictive model to be adopted in typical healthcare settings.
This paper presents the first complete method for automatically explaining results for any machine learning predictive model without degrading accuracy. We did a computer coding implementation of the method. Using the electronic medical record data set from the Practice Fusion diabetes classification competition containing patient records from all 50 states in the United States, we demonstrated the method on predicting type 2 diabetes diagnosis within the next year.
For the champion machine learning model of the competition, our method explained prediction results for 87.4 % of patients who were correctly predicted by the model to have type 2 diabetes diagnosis within the next year.
Our demonstration showed the feasibility of automatically explaining results for any machine learning predictive model without degrading accuracy.
预测建模是许多医疗保健问题解决方案的关键组成部分。在所有预测建模方法中,机器学习方法通常能达到最高的预测精度,但长期存在一个开放性问题,阻碍了它们在医疗保健中的广泛应用。大多数机器学习模型都没有对其预测结果进行解释,而可解释性对于预测模型在典型医疗保健环境中的应用至关重要。
本文提出了第一个完整的方法,可在不降低准确性的情况下自动解释任何机器学习预测模型的结果。我们通过计算机编码实现了该方法。使用来自 Practice Fusion 糖尿病分类竞赛的电子病历数据集,其中包含来自美国 50 个州的患者记录,我们在预测未来一年内 2 型糖尿病诊断方面展示了该方法。
对于竞赛的冠军机器学习模型,我们的方法解释了 87.4%的患者的预测结果,这些患者被模型正确预测为在未来一年内患有 2 型糖尿病。
我们的演示表明,无需降低准确性即可自动解释任何机器学习预测模型结果的可行性。