Suppr超能文献

使用高维预测模型基于健康管理数据增强风险预测。

Enhancing risk prediction base on health administrative data using high-dimensional prediction model.

作者信息

Hossain Md Belal, Sadatsafavi Mohsen, Wong Hubert, Cook Victoria J, Johnston James C, Karim Mohammad Ehsanul

机构信息

School of Population and Public Health, University of British Columbia, Vancouver, British Columbia, Canada; Centre for Advancing Health Outcomes, St. Paul's Hospital, Vancouver, British Columbia, Canada.

Respiratory Evaluation Sciences Program, Collaboration for Outcomes Research and Evaluation, Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, British Columbia, Canada.

出版信息

J Clin Epidemiol. 2025 May 30;184:111857. doi: 10.1016/j.jclinepi.2025.111857.

Abstract

OBJECTIVES

Health administrative datasets often do not contain important clinical variables for predicting the risk of medical outcomes. However, they often contain a wide range of health-care variables that can be used to develop a high-dimensional prediction model (hdPM) that compensates for the lack of clinical predictors. We aimed to compare the predictive performance of an hdPM with a conventional model that relies only on investigator-specified clinical predictors.

STUDY DESIGN AND SETTING

Using data on 2923 individuals diagnosed with tuberculosis (TB), a Cox proportional hazards model was used to simulate a time-to-event outcome using plasmode simulation. We considered two scenarios: whether strong or weak predictors were unavailable in the development sample. Conventional and hdPMs were fitted without and with least absolute shrinkage and selection operator (LASSO) shrinkage and were compared in terms of internally validated time-dependent c-statistic and calibration.

RESULTS

The hdPMs had a better time-dependent c-statistic in predicting TB mortality and also outperformed the conventional model in terms of time-dependent c-statistic in our simulations. Compared to a c-statistic of 0.78 for the conventional model with a strong unobserved predictor, LASSO-based hdPMs had a c-statistic of 0.90. While non-penalized hdPMs exhibited overfitting, LASSO-based hdPMs demonstrated superior cross-validated discrimination and calibration. Results were consistent in sensitivity analyses with varying numbers of additional health-care variables and different outcome types.

CONCLUSION

Health administrative data can compensate for the lack of known and important clinical variables with many health-care variables from the linked databases, especially in hdPMs with LASSO-regularization, substantially enhance predictive accuracy and offer a robust approach for risk stratification and assessment in epidemiological research.

PLAIN LANGUAGE SUMMARY

Researchers develop prediction models with only clinical variables. But health administrative data often do not contain some clinical variables. For example, smoking, weight, height, physical activity, and diet data are unavailable. They do have codes such as International Classification of Diseases (ICD)-9/10 diagnostic codes. We transformed these codes into binary and count variables. We created models to predict tuberculosis mortality. The models were not very accurate when using only clinical variables. Accuracy improved when we added the codes. We can use this kind of model in policy and research. For example, we can identify people at high mortality risk. We can then design interventions for the high-risk group.

摘要

目的

卫生行政数据集通常不包含用于预测医疗结果风险的重要临床变量。然而,它们通常包含广泛的医疗保健变量,可用于开发一种高维预测模型(hdPM),以弥补临床预测指标的不足。我们旨在比较hdPM与仅依赖研究人员指定的临床预测指标的传统模型的预测性能。

研究设计与设置

利用2923例结核病(TB)患者的数据,采用Cox比例风险模型通过模拟法模拟事件发生时间结局。我们考虑了两种情况:在开发样本中强预测指标或弱预测指标是否不可用。在不使用和使用最小绝对收缩和选择算子(LASSO)收缩的情况下拟合传统模型和hdPM,并根据内部验证的时间依赖性c统计量和校准进行比较。

结果

在预测结核病死亡率方面,hdPM具有更好的时间依赖性c统计量,并且在我们的模拟中,在时间依赖性c统计量方面也优于传统模型。与具有强未观察到的预测指标的传统模型的c统计量0.78相比,基于LASSO的hdPM的c统计量为0.90。虽然未惩罚的hdPM表现出过度拟合,但基于LASSO的hdPM表现出更好的交叉验证判别和校准。在对不同数量的额外医疗保健变量和不同结局类型进行敏感性分析时,结果是一致的。

结论

卫生行政数据可以用来自关联数据库的许多医疗保健变量弥补已知重要临床变量的不足,特别是在具有LASSO正则化的hdPM中,可显著提高预测准确性,并为流行病学研究中的风险分层和评估提供一种稳健的方法。

通俗易懂的总结

研究人员仅用临床变量开发预测模型。但卫生行政数据通常不包含一些临床变量。例如,吸烟、体重、身高、身体活动和饮食数据不可用。它们有国际疾病分类(ICD)-9/10诊断代码等编码。我们将这些编码转换为二元变量和计数变量。我们创建了预测结核病死亡率的模型。仅使用临床变量时,模型不是很准确。添加编码后准确性提高。我们可以在政策和研究中使用这种模型。例如,我们可以识别高死亡风险人群。然后我们可以为高危人群设计干预措施。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验