Shao Yijun, Ahmed Ali, Zamrini Edward Y, Cheng Yan, Goulet Joseph L, Zeng-Treitler Qing
Department of Clinical Research and Leadership, School of Medicine and Health Sciences, George Washington University, Washington, DC 20037, USA.
Washington DC VA Medical Center, Washington, DC 20422, USA.
J Pers Med. 2023 Jan 26;13(2):217. doi: 10.3390/jpm13020217.
Deep neural network (DNN) is a powerful technology that is being utilized by a growing number and range of research projects, including disease risk prediction models. One of the key strengths of DNN is its ability to model non-linear relationships, which include covariate interactions. We developed a novel method called interaction scores for measuring the covariate interactions captured by DNN models. As the method is model-agnostic, it can also be applied to other types of machine learning models. It is designed to be a generalization of the coefficient of the interaction term in a logistic regression; hence, its values are easily interpretable. The interaction score can be calculated at both an individual level and population level. The individual-level score provides an individualized explanation for covariate interactions. We applied this method to two simulated datasets and a real-world clinical dataset on Alzheimer's disease and related dementia (ADRD). We also applied two existing interaction measurement methods to those datasets for comparison. The results on the simulated datasets showed that the interaction score method can explain the underlying interaction effects, there are strong correlations between the population-level interaction scores and the ground truth values, and the individual-level interaction scores vary when the interaction was designed to be non-uniform. Another validation of our new method is that the interactions discovered from the ADRD data included both known and novel relationships.
深度神经网络(DNN)是一项强大的技术,越来越多不同领域的研究项目都在使用它,包括疾病风险预测模型。DNN的一个关键优势在于其对非线性关系进行建模的能力,其中包括协变量相互作用。我们开发了一种名为交互分数的新方法,用于衡量DNN模型捕捉到的协变量相互作用。由于该方法与模型无关,它也可以应用于其他类型的机器学习模型。它被设计为逻辑回归中交互项系数的推广;因此,其值易于解释。交互分数可以在个体层面和总体层面进行计算。个体层面的分数为协变量相互作用提供了个性化的解释。我们将此方法应用于两个模拟数据集以及一个关于阿尔茨海默病及相关痴呆症(ADRD)的真实临床数据集。我们还将两种现有的交互测量方法应用于这些数据集进行比较。模拟数据集的结果表明,交互分数方法能够解释潜在的交互效应,总体层面的交互分数与真实值之间存在很强的相关性,并且当交互设计为不均匀时,个体层面的交互分数会有所不同。我们新方法的另一个验证是,从ADRD数据中发现的交互关系既包括已知关系,也包括新关系。