Liu Xin, Schnell Patrick M
Division of Biostatistics, College of Public Health, The Ohio State University.
Ann Appl Stat. 2025 Jun;19(2):1332-1361. doi: 10.1214/24-aoas2007. Epub 2025 May 28.
Electronic medical records (EMR) data contain rich information that can facilitate health-related studies but is collected primarily for purposes other than research. For recurrent events, EMR data often do not record event times or counts but only contain intermittently assessed and censored observations (i.e. upper and/or lower bounds for counts in a time interval) at uncontrolled times. This can result in non-contiguous or overlapping assessment intervals with censored event counts. Existing methods for analyzing intermittently assessed recurrent events assume disjoint assessment intervals with known counts (interval count data) due to a focus on prospective studies with controlled assessment times. We propose a Bayesian data augmentation method to analyze the complicated assessments in EMR data for recurrent events. Within a Gibbs sampler, event times are imputed by generating sets of event times from non-homogeneous Poisson processes and rejecting proposed sets that are incompatible with constraints imposed by assessment data. Based on the independent increments property of Poisson processes, we implement three techniques to speed up this rejection sampling imputation method for large EMR datasets: independent sampling by partitioning, truncated generation, and sequential sampling. In a simulation study we show our method accurately estimates parameters of log-linear Poisson process intensities. Although the proposed method can be applied generally to EMR data of recurrent events, our study is specifically motivated by identifying risk factors for falls due to cancer treatment and its supportive medications. We used the proposed method to analyze an EMR dataset comprising 5501 patients treated for breast cancer. Our analysis provides evidence supporting associations between certain risk factors (including classes of medications) and risk of falls.
电子病历(EMR)数据包含丰富的信息,有助于开展与健康相关的研究,但这些数据主要是为研究以外的目的收集的。对于复发事件,EMR数据通常不记录事件发生时间或次数,仅包含在无控制时间点进行的间歇性评估和删失观测值(即时间间隔内次数的上限和/或下限)。这可能导致评估间隔不连续或重叠,且事件次数被删失。由于现有方法侧重于具有可控评估时间的前瞻性研究,因此在分析间歇性评估的复发事件时,假定评估间隔不相交且已知次数(间隔计数数据)。我们提出一种贝叶斯数据增广方法,用于分析EMR数据中复发事件的复杂评估。在吉布斯采样器中,通过从非齐次泊松过程生成事件时间集并拒绝与评估数据施加的约束不兼容的提议集来估算事件时间。基于泊松过程的独立增量性质,我们实现了三种技术来加速针对大型EMR数据集的这种拒绝采样估算方法:按分区进行独立采样、截断生成和顺序采样。在一项模拟研究中,我们表明我们的方法能够准确估计对数线性泊松过程强度的参数。尽管所提出的方法通常可应用于复发事件的EMR数据,但我们的研究特别受确定癌症治疗及其支持性药物导致跌倒的风险因素所驱动。我们使用所提出的方法分析了一个包含5501例接受乳腺癌治疗患者的EMR数据集。我们的分析提供了证据,支持某些风险因素(包括药物类别)与跌倒风险之间的关联。