Qiu Wei, Chen Hugh, Dincer Ayse Berceste, Lundberg Scott, Kaeberlein Matt, Lee Su-In
Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA USA.
Microsoft Research, Redmond, WA USA.
Commun Med (Lond). 2022 Oct 3;2:125. doi: 10.1038/s43856-022-00180-x. eCollection 2022.
Unlike linear models which are traditionally used to study all-cause mortality, complex machine learning models can capture non-linear interrelations and provide opportunities to identify unexplored risk factors. Explainable artificial intelligence can improve prediction accuracy over linear models and reveal great insights into outcomes like mortality. This paper comprehensively analyzes all-cause mortality by explaining complex machine learning models.
We propose the IMPACT framework that uses XAI technique to explain a state-of-the-art tree ensemble mortality prediction model. We apply IMPACT to understand all-cause mortality for 1-, 3-, 5-, and 10-year follow-up times within the NHANES dataset, which contains 47,261 samples and 151 features.
We show that IMPACT models achieve higher accuracy than linear models and neural networks. Using IMPACT, we identify several overlooked risk factors and interaction effects. Furthermore, we identify relationships between laboratory features and mortality that may suggest adjusting established reference intervals. Finally, we develop highly accurate, efficient and interpretable mortality risk scores that can be used by medical professionals and individuals without medical expertise. We ensure generalizability by performing temporal validation of the mortality risk scores and external validation of important findings with the UK Biobank dataset.
IMPACT's unique strength is the explainable prediction, which provides insights into the complex, non-linear relationships between mortality and features, while maintaining high accuracy. Our explainable risk scores could help individuals improve self-awareness of their health status and help clinicians identify patients with high risk. IMPACT takes a consequential step towards bringing contemporary developments in XAI to epidemiology.
与传统上用于研究全因死亡率的线性模型不同,复杂的机器学习模型可以捕捉非线性相互关系,并为识别未被探索的风险因素提供机会。可解释人工智能可以提高预测准确性,超越线性模型,并揭示有关死亡率等结果的深刻见解。本文通过解释复杂的机器学习模型,全面分析全因死亡率。
我们提出了IMPACT框架,该框架使用可解释人工智能技术来解释一种先进的树集成死亡率预测模型。我们应用IMPACT来了解美国国家健康与营养检查调查(NHANES)数据集中1年、3年、5年和10年随访期的全因死亡率,该数据集包含47261个样本和151个特征。
我们表明,IMPACT模型比线性模型和神经网络具有更高的准确性。使用IMPACT,我们识别出了几个被忽视的风险因素和相互作用效应。此外,我们确定了实验室特征与死亡率之间的关系,这可能意味着需要调整既定的参考区间。最后,我们开发了高度准确、高效且可解释的死亡率风险评分,可供医学专业人员和没有医学专业知识的个人使用。我们通过对死亡率风险评分进行时间验证以及使用英国生物银行数据集对重要发现进行外部验证来确保其普遍性。
IMPACT的独特优势在于其可解释的预测,它能够洞察死亡率与特征之间复杂的非线性关系,同时保持高准确性。我们的可解释风险评分可以帮助个人提高对自身健康状况的自我认知,并帮助临床医生识别高风险患者。IMPACT朝着将可解释人工智能的当代发展引入流行病学迈出了重要一步。