Zimbalist Alexa, Radimer Kelly H, Ergas Isaac J, Roh Janise M, Quesenberry Charles P, Kwan Marilyn L, Kushi Lawrence H
Division of Research, Kaiser Permanente Northern California, 4480 Hacienda Drive, Pleasanton, CA 94588, United States.
Division of Research, Kaiser Permanente Northern California, 4480 Hacienda Drive, Pleasanton, CA 94588, United States.
Ann Epidemiol. 2025 Apr;104:55-60. doi: 10.1016/j.annepidem.2025.02.013. Epub 2025 Mar 4.
Traditional methods to handle missing data rely on making assumptions about missing data patterns. Locally estimated scatterplot smoothing (LOESS) regression models were explored as a data-driven option to minimize missing weight data in a longitudinal cohort of breast cancer patients.
Outpatient weights from 2 years prior to breast cancer diagnosis to 10 years post were extracted from electronic health records for 10,778 women with invasive breast cancer diagnosed from 2005-2013 at Kaiser Permanente. LOESS regression models estimated weights at baseline (breast cancer diagnosis) and 6 follow-up time points (6, 12, 24, 48, 72, and 96 months post-baseline). The weights identified by the LOESS models were compared with those identified by the closest-available method, in which the weight measurement closest to each timepoint within a specified time window was selected.
Compared with the closest-available method, LOESS models identified fewer weights at baseline and 6 months post, but significantly more weights at later follow-up periods. At all timepoints, more than 80% of the weights identified by both approaches differed by 2.50 kilograms or less.
LOESS regression makes effective use of available longitudinal data and may be a beneficial tool to minimize missing longitudinal data in future EHR-based studies.
处理缺失数据的传统方法依赖于对缺失数据模式做出假设。局部估计散点图平滑(LOESS)回归模型被作为一种数据驱动的选项进行探索,以尽量减少一组乳腺癌患者纵向队列中缺失的体重数据。
从2005年至2013年在凯撒医疗集团被诊断为浸润性乳腺癌的10778名女性的电子健康记录中提取乳腺癌诊断前2年至诊断后10年的门诊体重数据。LOESS回归模型估计了基线(乳腺癌诊断时)和6个随访时间点(基线后6、12、24、48、72和96个月)的体重。将LOESS模型确定的体重与通过最近可用方法确定的体重进行比较,最近可用方法是在指定时间窗口内选择最接近每个时间点的体重测量值。
与最近可用方法相比,LOESS模型在基线和6个月后确定的体重较少,但在后期随访期确定的体重明显更多。在所有时间点,两种方法确定的体重中超过80%相差2.50千克或更少。
LOESS回归有效地利用了可用的纵向数据,可能是未来基于电子健康记录的研究中尽量减少纵向数据缺失的有益工具。