Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, United States.
McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, United States.
JMIR Aging. 2024 Jul 8;7:e54748. doi: 10.2196/54748.
Alzheimer disease and related dementias (ADRD) rank as the sixth leading cause of death in the United States, underlining the importance of accurate ADRD risk prediction. While recent advancements in ADRD risk prediction have primarily relied on imaging analysis, not all patients undergo medical imaging before an ADRD diagnosis. Merging machine learning with claims data can reveal additional risk factors and uncover interconnections among diverse medical codes.
The study aims to use graph neural networks (GNNs) with claim data for ADRD risk prediction. Addressing the lack of human-interpretable reasons behind these predictions, we introduce an innovative, self-explainable method to evaluate relationship importance and its influence on ADRD risk prediction.
We used a variationally regularized encoder-decoder GNN (variational GNN [VGNN]) integrated with our proposed relation importance method for estimating ADRD likelihood. This self-explainable method can provide a feature-important explanation in the context of ADRD risk prediction, leveraging relational information within a graph. Three scenarios with 1-year, 2-year, and 3-year prediction windows were created to assess the model's efficiency, respectively. Random forest (RF) and light gradient boost machine (LGBM) were used as baselines. By using this method, we further clarify the key relationships for ADRD risk prediction.
In scenario 1, the VGNN model showed area under the receiver operating characteristic (AUROC) scores of 0.7272 and 0.7480 for the small subset and the matched cohort data set. It outperforms RF and LGBM by 10.6% and 9.1%, respectively, on average. In scenario 2, it achieved AUROC scores of 0.7125 and 0.7281, surpassing the other models by 10.5% and 8.9%, respectively. Similarly, in scenario 3, AUROC scores of 0.7001 and 0.7187 were obtained, exceeding 10.1% and 8.5% than the baseline models, respectively. These results clearly demonstrate the significant superiority of the graph-based approach over the tree-based models (RF and LGBM) in predicting ADRD. Furthermore, the integration of the VGNN model and our relation importance interpretation could provide valuable insight into paired factors that may contribute to or delay ADRD progression.
Using our innovative self-explainable method with claims data enhances ADRD risk prediction and provides insights into the impact of interconnected medical code relationships. This methodology not only enables ADRD risk modeling but also shows potential for other image analysis predictions using claims data.
阿尔茨海默病和相关痴呆症(ADRD)是美国第六大死因,这凸显了准确预测 ADRD 风险的重要性。尽管最近在 ADRD 风险预测方面取得了进展,但主要依赖于影像学分析,并非所有患者在 ADRD 诊断前都进行医学影像学检查。将机器学习与索赔数据相结合,可以揭示其他风险因素,并发现不同医疗代码之间的相互关系。
本研究旨在使用带有索赔数据的图神经网络(GNN)进行 ADRD 风险预测。为了解决这些预测背后缺乏可解释性的问题,我们引入了一种创新的、可自我解释的方法来评估关系重要性及其对 ADRD 风险预测的影响。
我们使用变分正则化编码器-解码器 GNN(变分 GNN[VGNN])与我们提出的关系重要性方法相结合,估算 ADRD 的可能性。这种可自我解释的方法可以在 ADRD 风险预测的背景下提供特征重要性解释,利用图内的关系信息。分别创建了具有 1 年、2 年和 3 年预测窗口的三个场景,以评估模型的效率。随机森林(RF)和轻梯度提升机(LGBM)被用作基线。通过使用这种方法,我们进一步阐明了 ADRD 风险预测的关键关系。
在场景 1 中,VGNN 模型在小数据集和匹配队列数据集上的接收器操作特征(AUROC)得分分别为 0.7272 和 0.7480。与 RF 和 LGBM 相比,它的平均表现分别高出 10.6%和 9.1%。在场景 2 中,它的 AUROC 得分分别为 0.7125 和 0.7281,比其他模型分别高出 10.5%和 8.9%。同样,在场景 3 中,AUROC 得分分别为 0.7001 和 0.7187,比基线模型分别高出 10.1%和 8.5%。这些结果清楚地表明,基于图的方法在预测 ADRD 方面明显优于基于树的模型(RF 和 LGBM)。此外,VGNN 模型与我们的关系重要性解释的结合可以提供有价值的见解,了解可能导致或延迟 ADRD 进展的配对因素。
使用我们带有索赔数据的创新的可自我解释方法可以提高 ADRD 风险预测的准确性,并深入了解相互关联的医疗代码关系的影响。这种方法不仅可以进行 ADRD 风险建模,还可以为使用索赔数据进行其他图像分析预测提供潜力。