Pan Yinghao, Laber Eric B, Smith Maureen A, Zhao Ying-Qi
Department of Mathematics and Statistics, University of North Carolina at Charlotte.
Department of Statistics, North Carolina State University.
J Am Stat Assoc. 2023;118(542):1090-1101. doi: 10.1080/01621459.2021.1978467. Epub 2021 Nov 30.
Uncontrolled glycated hemoglobin (HbA1c) levels are associated with adverse events among complex diabetic patients. These adverse events present serious health risks to affected patients and are associated with significant financial costs. Thus, a high-quality predictive model that could identify high-risk patients so as to inform preventative treatment has the potential to improve patient outcomes while reducing healthcare costs. Because the biomarker information needed to predict risk is costly and burdensome, it is desirable that such a model collect only as much information as is needed on each patient so as to render an accurate prediction. We propose a sequential predictive model that uses accumulating patient longitudinal data to classify patients as: high-risk, low-risk, or uncertain. Patients classified as high-risk are then recommended to receive preventative treatment and those classified as low-risk are recommended to standard care. Patients classified as uncertain are monitored until a high-risk or low-risk determination is made. We construct the model using claims and enrollment files from Medicare, linked with patient Electronic Health Records (EHR) data. The proposed model uses functional principal components to accommodate noisy longitudinal data and weighting to deal with missingness and sampling bias. The proposed method demonstrates higher predictive accuracy and lower cost than competing methods in a series of simulation experiments and application to data on complex patients with diabetes.
未得到控制的糖化血红蛋白(HbA1c)水平与复杂糖尿病患者的不良事件相关。这些不良事件给受影响的患者带来严重的健康风险,并导致巨大的经济成本。因此,一个高质量的预测模型,能够识别高危患者以便指导预防性治疗,有可能改善患者的治疗结果,同时降低医疗成本。由于预测风险所需的生物标志物信息成本高昂且获取困难,理想的情况是这样一个模型在每个患者身上仅收集所需的尽可能少的信息,以便做出准确的预测。我们提出一种序贯预测模型,该模型利用不断积累的患者纵向数据将患者分类为:高危、低危或不确定。被分类为高危的患者随后被建议接受预防性治疗,而被分类为低危的患者则被建议接受标准治疗。被分类为不确定的患者则进行监测,直到做出高危或低危的判定。我们使用医疗保险的理赔和参保文件构建该模型,并将其与患者电子健康记录(EHR)数据相链接。所提出的模型使用功能主成分来处理有噪声的纵向数据,并使用加权来处理数据缺失和抽样偏差。在一系列模拟实验以及应用于复杂糖尿病患者数据时,所提出的方法比其他竞争方法表现出更高的预测准确性和更低的成本。