Diaz-Ramirez L Grisell, Lee Sei J, Smith Alexander K, Gan Siqi, Boscardin W John
Division of Geriatrics, University of California, San Francisco, 490 Illinois Street, Floor 08, Box 1265, San Francisco, CA 94143, United States; San Francisco Veterans Affairs (VA) Medical Center, 4150 Clement Street, 181G, San Francisco, CA 94121, United States.
Comput Methods Programs Biomed. 2021 Jun;204:106073. doi: 10.1016/j.cmpb.2021.106073. Epub 2021 Mar 27.
Most methods for developing clinical prognostic models focus on identifying parsimonious and accurate models to predict a single outcome; however, patients and providers often want to predict multiple outcomes simultaneously. As an example, for older adults one is often interested in predicting nursing home admission as well as mortality. We propose and evaluate a novel predictor-selection computing method for multiple outcomes and provide the code for its implementation.
Our proposed algorithm selected the best subset of common predictors based on the minimum average normalized Bayesian Information Criterion (BIC) across outcomes: the Best Average BIC (baBIC) method. We compared the predictive accuracy (Harrell's C-statistic) and parsimony (number of predictors) of the model obtained using the baBIC method with: 1) a subset of common predictors obtained from the union of optimal models for each outcome (Union method), 2) a subset obtained from the intersection of optimal models for each outcome (Intersection method), and 3) a model with no variable selection (Full method). We used a case-study data from the Health and Retirement Study (HRS) to demonstrate our method and conducted a simulation study to investigate performance.
In the case-study data and simulations, the average Harrell's C-statistics across outcomes of the models obtained with the baBIC and Union methods were comparable. Despite the similar discrimination, the baBIC method produced more parsimonious models than the Union method. In contrast, the models selected with the Intersection method were the most parsimonious, but with worst predictive accuracy, and the opposite was true in the Full method. In the simulations, the baBIC method performed well by identifying many of the predictors selected in the baBIC model of the case-study data most of the time and excluding those not selected in the majority of the simulations.
Our method identified a common subset of variables to predict multiple clinical outcomes with superior balance between parsimony and predictive accuracy to current methods.
大多数开发临床预后模型的方法都聚焦于识别简洁且准确的模型来预测单一结局;然而,患者和医疗服务提供者常常希望能同时预测多个结局。例如,对于老年人,人们通常既想预测其是否会入住养老院,又想预测其死亡率。我们提出并评估了一种用于多个结局的新型预测变量选择计算方法,并提供了其实现代码。
我们提出的算法基于各结局的最小平均归一化贝叶斯信息准则(BIC)来选择共同预测变量的最佳子集:最佳平均BIC(baBIC)方法。我们将使用baBIC方法获得的模型的预测准确性(哈雷尔C统计量)和简洁性(预测变量数量)与以下方法进行比较:1)从每个结局的最优模型的并集中获得的共同预测变量子集(并集方法),2)从每个结局的最优模型的交集中获得的子集(交集方法),以及3)不进行变量选择的模型(全变量方法)。我们使用来自健康与退休研究(HRS)的案例研究数据来演示我们的方法,并进行了模拟研究以调查性能。
在案例研究数据和模拟中,使用baBIC方法和并集方法获得的模型在各结局上的平均哈雷尔C统计量相当。尽管判别能力相似,但baBIC方法产生的模型比并集方法更简洁。相比之下,用交集方法选择的模型最简洁,但预测准确性最差,而全变量方法则相反。在模拟中,baBIC方法表现良好,大多数时候能识别出案例研究数据的baBIC模型中选择的许多预测变量,并排除在大多数模拟中未被选择的变量。
我们的方法识别出了一个共同的变量子集,用于预测多个临床结局,在简洁性和预测准确性之间的平衡优于当前方法。