Schuessler Maximilian, Fleming Scott, Meyer Shannon, Seto Tina, Hernandez-Boussard Tina
Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
Department of Statistics, Stanford University, Stanford, CA, USA.
Commun Med (Lond). 2025 Jul 1;5(1):261. doi: 10.1038/s43856-025-00965-w.
Real-world medical environments such as oncology are highly dynamic due to rapid changes in medical practice, technologies, and patient characteristics. This variability, if not addressed, can result in data shifts with potentially poor model performance. Presently, there are few easy-to-implement, model-agnostic diagnostic frameworks to vet machine learning models for future applicability and temporal consistency.
We extracted clinical data from EHR for a cohort of over 24,000 patients who received antineoplastic therapy within a distinct year. The label of this study are acute care utilization (ACU) events, i.e., emergency department visits and hospitalizations, within 180 days of treatment initiation. Our cross-sectional data spans treatment initiation points from 2010-2022. We implemented three models within our validation framework: Least Absolute Shrinkage and Selection Operator (LASSO), Random Forest (RF), and Extreme Gradient Boosting (XGBoost).
Here, we introduce a model-agnostic diagnostic framework to validate clinical machine learning models on time-stamped data, consisting of four stages. First, the framework evaluates performance by partitioning data from multiple years into training and validation cohorts. Second, it characterizes the temporal evolution of patient outcomes and characteristics. Third, model longevity and trade-offs between data quantity and recency are explored. Finally, feature importance and data valuation algorithms are applied for feature reduction and data quality assessment. When applied to predicting ACU in cancer patients, the framework highlights fluctuations in features, labels, and data values over time.
The work in this study emphasizes the importance of data timeliness and relevance. The results on ACU in cancer patients show moderate signs of drift and corroborate the relevance of temporal considerations when validating machine learning models for deployment at the point of care.
由于医疗实践、技术和患者特征的快速变化,肿瘤学等实际医疗环境具有高度动态性。这种变异性若不加以解决,可能导致数据偏移,进而使模型性能可能变差。目前,几乎没有易于实施的、与模型无关的诊断框架来审查机器学习模型的未来适用性和时间一致性。
我们从电子健康记录(EHR)中提取了一组超过24000名在特定年份接受抗肿瘤治疗的患者的临床数据。本研究的标签是治疗开始后180天内的急性护理利用(ACU)事件,即急诊就诊和住院情况。我们的横断面数据涵盖了2010年至2022年的治疗开始时间点。我们在验证框架内实施了三种模型:最小绝对收缩和选择算子(LASSO)、随机森林(RF)和极端梯度提升(XGBoost)。
在此,我们引入了一个与模型无关的诊断框架,用于在带时间戳的数据上验证临床机器学习模型,该框架由四个阶段组成。首先,该框架通过将多年数据划分为训练和验证队列来评估性能。其次,它描述了患者结局和特征的时间演变。第三,探索模型寿命以及数据数量和时效性之间的权衡。最后,应用特征重要性和数据评估算法进行特征约简和数据质量评估。当应用于预测癌症患者的ACU时,该框架突出了特征、标签和数据值随时间的波动。
本研究工作强调了数据及时性和相关性的重要性。癌症患者ACU的结果显示出适度的偏移迹象,并证实了在验证用于即时护理部署的机器学习模型时考虑时间因素的相关性。