Kaiser Permanente Washington Health Research Institute, Seattle, Washington, USA.
Department of Health Systems Science, Bernard J. Tyson Kaiser Permanente School of Medicine, Pasadena, California, USA.
Pharmacoepidemiol Drug Saf. 2024 Jan;33(1):e5734. doi: 10.1002/pds.5734. Epub 2023 Dec 19.
Observational studies assessing effects of medical products on suicidal behavior often rely on health record data to account for pre-existing risk. We assess whether high-dimensional models predicting suicide risk using data derived from insurance claims and electronic health records (EHRs) are superior to models using data from insurance claims alone.
Data were from seven large health systems identified outpatient mental health visits by patients aged 11 or older between 1/1/2009 and 9/30/2017. Data for the 5 years prior to each visit identified potential predictors of suicidal behavior typically available from insurance claims (e.g., mental health diagnoses, procedure codes, medication dispensings) and additional potential predictors available from EHRs (self-reported race and ethnicity, responses to Patient Health Questionnaire or PHQ-9 depression questionnaires). Nonfatal self-harm events following each visit were identified from insurance claims data and fatal self-harm events were identified by linkage to state mortality records. Random forest models predicting nonfatal or fatal self-harm over 90 days following each visit were developed in a 70% random sample of visits and validated in a held-out sample of 30%. Performance of models using linked claims and EHR data was compared to models using claims data only.
Among 15 845 047 encounters by 1 574 612 patients, 99 098 (0.6%) were followed by a self-harm event within 90 days. Overall classification performance did not differ between the best-fitting model using all data (area under the receiver operating curve or AUC = 0.846, 95% CI 0.839-0.854) and the best-fitting model limited to data available from insurance claims (AUC = 0.846, 95% CI 0.838-0.853). Competing models showed similar classification performance across a range of cut-points and similar calibration performance across a range of risk strata. Results were similar when the sample was limited to health systems and time periods where PHQ-9 depression questionnaires were recorded more frequently.
Investigators using health record data to account for pre-existing risk in observational studies of suicidal behavior need not limit that research to databases including linked EHR data.
评估医疗产品对自杀行为影响的观察性研究通常依赖于健康记录数据来考虑先前存在的风险。我们评估了使用源自保险索赔和电子健康记录(EHR)的数据预测自杀风险的高维模型是否优于仅使用保险索赔数据的模型。
数据来自 2009 年 1 月 1 日至 2017 年 9 月 30 日期间七个大型医疗系统中年龄在 11 岁或以上的门诊精神健康就诊患者。在每次就诊前的 5 年内,数据确定了自杀行为的潜在预测因素,这些因素通常可从保险索赔中获得(例如,精神健康诊断,程序代码,药物配给),并且还可以从 EHR 中获得其他潜在预测因素(自我报告的种族和民族,对患者健康问卷或 PHQ-9 抑郁问卷的回答)。从保险索赔数据中确定了每次就诊后的非致命性自伤事件,通过与州死亡记录的链接确定了致命性自伤事件。在 70%的就诊随机样本中开发了预测每次就诊后 90 天内非致命或致命性自伤的随机森林模型,并在保留的 30%样本中进行了验证。比较了使用链接的索赔和 EHR 数据的模型与仅使用索赔数据的模型的性能。
在 15845047 次就诊的 1574612 名患者中,有 99098(0.6%)在 90 天内发生了自伤事件。总体分类性能在使用所有数据的最佳拟合模型(接收器工作曲线下面积或 AUC=0.846,95%CI 0.839-0.854)和仅限于保险索赔中可用数据的最佳拟合模型(AUC=0.846,95%CI 0.838-0.853)之间没有差异。在一系列切点和一系列风险分层中,竞争模型均显示出相似的分类性能和相似的校准性能。当样本仅限于 PHQ-9 抑郁问卷记录更频繁的卫生系统和时间范围时,结果相似。
在自杀行为的观察性研究中,使用健康记录数据来考虑先前存在的风险的研究人员不必将该研究仅限于包含链接 EHR 数据的数据库。