Uno Hajime, Ritzwoller Debra P, Cronin Angel M, Carroll Nikki M, Hornbrook Mark C, Hassett Michael J
Hajime Uno, Angel M. Cronin, and Michael J. Hassett, Dana-Farber Cancer Institute, Boston, MA; Debra P. Ritzwoller and Nikki M. Carroll, Kaiser Permanente Colorado, Denver, CO; and Mark C. Hornbrook, Kaiser Permanente Center for Health Research, Portland, OR.
JCO Clin Cancer Inform. 2018 Dec;2:1-10. doi: 10.1200/CCI.17.00163.
Data from claims and electronic medical records (EMRs) are frequently used to identify clinical events (eg, cancer diagnosis, stroke). However, accurately determining the time of clinical events can be challenging, and the methods used to generate time estimates are underdeveloped. We sought to develop an approach to determine the time of a clinical event-cancer recurrence-using high-dimensional longitudinal structured data.
Manual chart abstraction provided information regarding the actual time of cancer recurrence. These data were linked to claims from Medicare or structured EMR data from the Cancer Research Network, which were used to determine time of recurrence for patients with lung or colorectal cancer. We analyzed the longitudinal profile of codes that could help determine the time of recurrence, adjusted for systematic differences between code dates and recurrence dates, and integrated time estimates from different codes to empirically derive an optimal algorithm.
We identified twelve code groups that could help determine the time of recurrence. Using claims data for patients with lung cancer, the optimal algorithm consisted of three code groups and provided an average prediction error of 4.8 months. Using EMR data or applying this approach to patients with colorectal cancer yielded similar results.
Time estimates were improved by selecting codes not necessarily the same as those used to identify recurrence, combining time estimates from multiple code groups, and adjusting for systematic bias between code dates and recurrence dates. Improving the accuracy of time estimates for clinical events can facilitate research, quality measurement, and process improvement.
索赔数据和电子病历(EMR)常被用于识别临床事件(如癌症诊断、中风)。然而,准确确定临床事件的时间可能具有挑战性,且用于生成时间估计的方法尚不完善。我们试图开发一种利用高维纵向结构化数据来确定临床事件——癌症复发时间的方法。
人工病历摘要提供了有关癌症实际复发时间的信息。这些数据与医疗保险索赔数据或癌症研究网络的结构化电子病历数据相关联,用于确定肺癌或结直肠癌患者的复发时间。我们分析了有助于确定复发时间的代码的纵向特征,对代码日期和复发日期之间的系统差异进行了调整,并整合了来自不同代码的时间估计,以实证推导一种最优算法。
我们识别出了十二个有助于确定复发时间的代码组。使用肺癌患者的索赔数据,最优算法由三个代码组组成,平均预测误差为4.8个月。使用电子病历数据或将此方法应用于结直肠癌患者也得到了类似结果。
通过选择不一定与用于识别复发的代码相同的代码、结合多个代码组的时间估计以及调整代码日期和复发日期之间的系统偏差,时间估计得到了改善。提高临床事件时间估计的准确性有助于开展研究、进行质量评估和改进流程。