Chubak Jessica, Onega Tracy, Zhu Weiwei, Buist Diana S M, Hubbard Rebecca A
*Group Health Research Institute †Department of Epidemiology, University of Washington, Seattle, WA ‡Department of Community and Family Medicine, The Dartmouth Institute for Health Policy and Clinical Practice, and Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH Departments of §Health Services, University of Washington, Seattle, WA ∥Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA.
Med Care. 2017 Dec;55(12):e81-e87. doi: 10.1097/MLR.0000000000000352.
Studies of cancer recurrences and second primary tumors require information on outcome dates. Little is known about how well electronic health record-based algorithms can identify dates or how errors in dates can bias analyses.
We assessed rule-based and model-fitting approaches to assign event dates using a previously published electronic health record-based algorithm for second breast cancer events (SBCE). We conducted a simulation study to assess bias due to date assignment errors in time-to-event analyses.
From a cohort of 3152 early-stage breast cancer patients, 358 women accurately identified as having had an SBCE served as the basis for this analysis.
Percent of predicted SBCE dates identified within ±60 days of the true date was the primary measure of accuracy. In the simulation study, bias in hazard ratios (HRs) was estimated by averaging the difference between HRs based on algorithm-assigned dates and the true HR across 1000 simulations each with simulated N=4000.
The most accurate date algorithm had a median difference between the true and predicted dates of 0 days with 82% of predicted dates falling within 60 days of the true date. Bias resulted when algorithm sensitivity and specificity varied by exposure status, but was minimal when date assignment errors were of the magnitude observed for our date assignment method.
SBCE date can be relatively accurately assigned based on a previous algorithm. While acceptable in many scenarios, algorithm-assigned dates are not appropriate to use when operating characteristics are likely to vary by the study exposure.
癌症复发和第二原发性肿瘤的研究需要结局日期信息。对于基于电子健康记录的算法在识别日期方面的表现如何,以及日期错误如何影响分析,我们知之甚少。
我们使用先前发表的基于电子健康记录的第二乳腺癌事件(SBCE)算法,评估了基于规则和模型拟合的方法来确定事件日期。我们进行了一项模拟研究,以评估在事件发生时间分析中由于日期分配错误导致的偏差。
从3152例早期乳腺癌患者队列中,准确识别出358名患有SBCE的女性作为本分析的基础。
在真实日期±60天内确定的预测SBCE日期的百分比是准确性的主要测量指标。在模拟研究中,通过对基于算法分配日期的风险比(HRs)与真实HR之间的差异进行平均来估计HRs的偏差,共进行1000次模拟,每次模拟的样本量N = 4000。
最准确的日期算法在真实日期和预测日期之间的中位数差异为0天,82%的预测日期落在真实日期的60天内。当算法的敏感性和特异性因暴露状态而异时会产生偏差,但当日期分配错误达到我们日期分配方法所观察到的程度时,偏差最小。
基于先前的算法可以相对准确地确定SBCE日期。虽然在许多情况下是可以接受的,但当操作特征可能因研究暴露而异时,算法分配的日期不适合使用。