Suppr超能文献

评估协变量回溯时间对患者水平预测模型性能的影响。

Evaluating the impact of covariate lookback times on performance of patient-level prediction models.

机构信息

Observational Health Data Sciences and Informatics Community, New York, NY, USA.

Department of Epidemiology, Janssen Research & Development, LLC, 1125 Trenton-Harbourton Road, Titusville, NJ, 08560, USA.

出版信息

BMC Med Res Methodol. 2021 Aug 28;21(1):180. doi: 10.1186/s12874-021-01370-2.

Abstract

BACKGROUND

The goal of our study is to examine the impact of the lookback length when engineering features to use in developing predictive models using observational healthcare data. Using a longer lookback for feature engineering gives more insight about patients but increases the issue of left-censoring.

METHODS

We used five US observational databases to develop patient-level prediction models. A target cohort of subjects with hypertensive drug exposures and outcome cohorts of subjects with acute (stroke and gastrointestinal bleeding) and chronic outcomes (diabetes and chronic kidney disease) were developed. Candidate predictors that exist on or prior to the target index date were derived within the following lookback periods: 14, 30, 90, 180, 365, 730, and all days prior to index were evaluated. We predicted the risk of outcomes occurring 1 day until 365 days after index. Ten lasso logistic models for each lookback period were generated to create a distribution of area under the curve (AUC) metrics to evaluate the discriminative performance of the models. Calibration intercept and slope were also calculated. Impact on external validation performance was investigated across five databases.

RESULTS

The maximum differences in AUCs for the models developed using different lookback periods within a database was < 0.04 for diabetes (in MDCR AUC of 0.593 with 14-day lookback vs. AUC of 0.631 with all-time lookback) and 0.012 for renal impairment (in MDCR AUC of 0.675 with 30-day lookback vs. AUC of 0.687 with 365-day lookback ). For the acute outcomes, the max difference in AUC across lookbacks within a database was 0.015 (in MDCD AUC of 0.767 with 14-day lookback vs. AUC 0.782 with 365-day lookback) for stroke and < 0.03 for gastrointestinal bleeding (in CCAE AUC of 0.631 with 14-day lookback vs. AUC of 0.660 with 730-day lookback).

CONCLUSIONS

In general the choice of covariate lookback had only a small impact on discrimination and calibration, with a short lookback (< 180 days) occasionally decreasing discrimination. Based on the results, if training a logistic regression model for prediction then using covariates with a 365 day lookback appear to be a good tradeoff between performance and interpretation.

摘要

背景

我们研究的目标是检验在使用观察性医疗保健数据开发预测模型时,用于工程特征的回溯长度的影响。使用更长的回溯时间进行特征工程可以更深入地了解患者情况,但会增加左截断的问题。

方法

我们使用了五个美国观察性数据库来开发患者级别的预测模型。建立了一个目标队列,其中包含接受抗高血压药物治疗的患者,以及急性(中风和胃肠道出血)和慢性结局(糖尿病和慢性肾脏病)的结局队列。在以下回溯期内推导出存在于目标索引日期或之前的候选预测因子:14、30、90、180、365、730 天,以及索引之前的所有天数。我们预测了从索引后 1 天到 365 天内发生结局的风险。为每个回溯期生成了 10 个套索逻辑回归模型,以创建曲线下面积 (AUC) 度量的分布,以评估模型的区分性能。还计算了校准截距和斜率。在五个数据库中研究了对外部验证性能的影响。

结果

在一个数据库内使用不同回溯期开发的模型之间,AUC 的最大差异<0.04 用于糖尿病(在 MDCR 的 AUC 为 0.593,使用 14 天回溯,而 AUC 为 0.631,使用所有时间回溯)和 0.012 用于肾功能损害(在 MDCR 的 AUC 为 0.675,使用 30 天回溯,而 AUC 为 0.687,使用 365 天回溯)。对于急性结局,在一个数据库内,回溯之间 AUC 的最大差异<0.03,用于胃肠道出血(在 CCAE 的 AUC 为 0.631,使用 14 天回溯,而 AUC 为 0.660,使用 730 天回溯)。

结论

一般来说,协变量回溯的选择对区分度和校准度的影响很小,短回溯(<180 天)偶尔会降低区分度。根据结果,如果要训练用于预测的逻辑回归模型,那么使用具有 365 天回溯的协变量似乎是性能和解释之间的良好折衷。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eaa9/8403343/72ef4ae87be7/12874_2021_1370_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验