Martín-Merino Elisa, Calderón-Larrañaga Amaia, Hawley Samuel, Poblador-Plou Beatriz, Llorente-García Ana, Petersen Irene, Prieto-Alhambra Daniel
Base de datos para la Investigación Farmacoepidemiológica en Atención Primaria, Division of Pharmacoepidemiology and Pharmacovigilance, Spanish Agency of Medicines and Medical Devices, Madrid, Spain.
Aging Research Center, Karolinska Institutet, Stockholm University, Stockholm, Sweden.
Clin Epidemiol. 2018 Jun 5;10:643-654. doi: 10.2147/CLEP.S154914. eCollection 2018.
Missing data are often an issue in electronic medical records (EMRs) research. However, there are many ways that people deal with missing data in drug safety studies.
To compare the risk estimates resulting from different strategies for the handling of missing data in the study of venous thromboembolism (VTE) risk associated with antiosteoporotic medications (AOM).
New users of AOM (alendronic acid, other bisphosphonates, strontium ranelate, selective estrogen receptor modulators, teriparatide, or denosumab) aged ≥50 years during 1998-2014 were identified in two Spanish (the Base de datos para la Investigación Farmacoepidemiológica en Atención Primaria [BIFAP] and EpiChron cohort) and one UK (Clinical Practice Research Datalink [CPRD]) EMR. Hazard ratios (HRs) according to AOM (with alendronic acid as reference) were calculated adjusting for VTE risk factors, body mass index (that was missing in 61% of patients included in the three databases), and smoking (that was missing in 23% of patients) in the year of AOM therapy initiation. HRs and standard errors obtained using cross-sectional multiple imputation (MI) (reference method) were compared to complete case (CC) analysis - using only patients with complete data - and longitudinal MI - adding to the cross-sectional MI model the body mass index/smoking values as recorded in the year before and after therapy initiation.
Overall, 422/95,057 (0.4%), 19/12,688 (0.1%), and 2,051/161,202 (1.3%) VTE cases/participants were seen in BIFAP, EpiChron, and CPRD, respectively. HRs moved from 100.00% underestimation to 40.31% overestimation in CC compared with cross-sectional MI, while longitudinal MI methods provided similar risk estimates compared with cross-sectional MI. Precision for HR improved in cross-sectional MI versus CC by up to 160.28%, while longitudinal MI improved precision (compared with cross-sectional) only minimally (up to 0.80%).
CC may substantially affect relative risk estimation in EMR-based drug safety studies, since missing data are not often completely at random. Little improvement was seen in these data in terms of power with the inclusion of longitudinal MI compared with cross-sectional MI. The strategy for handling missing data in drug safety studies can have a large impact on both risk estimates and precision.
在电子病历(EMR)研究中,缺失数据常常是个问题。然而,在药物安全性研究中,人们处理缺失数据的方法有很多。
比较在与抗骨质疏松药物(AOM)相关的静脉血栓栓塞(VTE)风险研究中,不同缺失数据处理策略所产生的风险估计值。
在两个西班牙的电子病历(初级医疗保健药物流行病学研究数据库[BIFAP]和EpiChron队列)以及一个英国的电子病历(临床实践研究数据链[CPRD])中,识别出1998 - 2014年间年龄≥50岁的AOM(阿仑膦酸钠、其他双膦酸盐、雷奈酸锶、选择性雌激素受体调节剂、特立帕肽或地诺单抗)新使用者。在AOM治疗开始年份,根据AOM(以阿仑膦酸钠为参照)计算风险比(HRs),并对VTE风险因素、体重指数(在三个数据库纳入的患者中有61%缺失该数据)和吸烟(在23%的患者中缺失该数据)进行校正。将使用横断面多重填补(MI)(参照方法)获得的HRs和标准误与完全病例(CC)分析(仅使用数据完整的患者)以及纵向MI(在横断面MI模型基础上加入治疗开始前后年份记录的体重指数/吸烟值)进行比较。
总体而言,在BIFAP、EpiChron和CPRD中分别观察到422/95,057(0.4%)、19/12,688(0.1%)和2,051/161,202(1.3%)例VTE病例/参与者。与横断面MI相比,CC分析中HRs从低估100.00%变为高估40.31%,而纵向MI方法与横断面MI提供的风险估计值相似。与CC分析相比,横断面MI中HR的精度提高了高达160.28%,而纵向MI(与横断面相比)仅略微提高了精度(高达0.80%)。
由于缺失数据往往并非完全随机,CC分析可能会对基于电子病历的药物安全性研究中的相对风险估计产生重大影响。与横断面MI相比,纳入纵向MI在这些数据的效能方面几乎没有改善。药物安全性研究中处理缺失数据的策略可能会对风险估计和精度都产生很大影响。