Suppr超能文献

使用医疗保健理赔数据进行亚组识别的高维迭代因果森林(hdiCF)

High-dimensional Iterative Causal Forest (hdiCF) for Subgroup Identification Using Health Care Claims Data.

作者信息

Wang Tiansheng, Pate Virginia, Wyss Richard, Buse John B, Kosorok Michael R, Stürmer Til

机构信息

Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC.

Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Boston, MA.

出版信息

Am J Epidemiol. 2024 Sep 5. doi: 10.1093/aje/kwae322.

Abstract

We recently developed a machine-learning subgrouping algorithm, iterative causal forest (iCF), to identify subgroups with heterogeneous treatment effects (HTEs) using predefined covariates. However, such predefined covariates may miss or poorly define important features leading to inaccurate subgrouping. To address such limitations, we developed a new semi-automatic subgrouping algorithm, hdiCF, which adapts methodology from high-dimensional propensity score for feature recognition in claims data. The hdiCF algorithm has 3 steps: 1) high-dimensional feature identification by International Classification of Diseases, Current Procedural Terminology, and Anatomical Therapeutic Chemical codes (in/outpatient diagnoses, procedures, prescriptions) and creation of ordinal variables by frequency of occurrence; 2) propensity score trimming and high-dimensional feature preparation; 3) iCF implementation to identify subgroups. We applied hdiCF in a 20% random sample of fee-for-service Medicare beneficiaries who initiated sodium-glucose cotransporter-2 inhibitors (SGLT2i) or glucagon-like peptide-1 receptor agonists to identify subgroups with HTEs for incidence of hospitalized heart failure. HdiCF findings were consistent with studies suggesting SGLT2i to be more beneficial for patients with pre-existing heart failure or chronic kidney disease. HdiCF is not dependent on prior hypotheses about HTEs and identifies subgroups with markers for potential HTEs in real-world evidence studies where active-comparator, new-user study designs limit the potential for unmeasured confounding.

摘要

我们最近开发了一种机器学习亚组划分算法——迭代因果森林(iCF),用于使用预定义协变量识别具有异质性治疗效果(HTE)的亚组。然而,这种预定义协变量可能会遗漏或错误定义重要特征,导致亚组划分不准确。为解决此类局限性,我们开发了一种新的半自动亚组划分算法——hdiCF,它采用高维倾向评分方法在索赔数据中进行特征识别。hdiCF算法有三个步骤:1)通过国际疾病分类、当前程序术语和解剖治疗化学代码(门诊/住院诊断、程序、处方)进行高维特征识别,并按出现频率创建有序变量;2)倾向评分修剪和高维特征准备;3)实施iCF以识别亚组。我们将hdiCF应用于20%的按服务收费的医疗保险受益人的随机样本中,这些受益人开始使用钠-葡萄糖协同转运蛋白2抑制剂(SGLT2i)或胰高血糖素样肽-1受体激动剂,以识别因住院心力衰竭发生率而具有HTE的亚组。HdiCF的研究结果与表明SGLT2i对已有心力衰竭或慢性肾病患者更有益的研究一致。HdiCF不依赖于关于HTE的先验假设,并且在主动对照、新用户研究设计限制未测量混杂可能性的真实世界证据研究中,识别具有潜在HTE标记的亚组。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验