Liu Ruoqi, Chen Pin-Yu, Zhang Ping
Department of Computer Science and Engineering, The Ohio State University, 2015 Neil Avenue, Columbus, OH 43210, USA.
IBM Research, Yorktown Heights, NY 10598, USA.
Patterns (N Y). 2024 May 1;5(6):100973. doi: 10.1016/j.patter.2024.100973. eCollection 2024 Jun 14.
Treatment effect estimation (TEE) aims to identify the causal effects of treatments on important outcomes. Current machine-learning-based methods, mainly trained on labeled data for specific treatments or outcomes, can be sub-optimal with limited labeled data. In this article, we propose a new pre-training and fine-tuning framework, CURE (causal treatment effect estimation), for TEE from observational data. CURE is pre-trained on large-scale unlabeled patient data to learn representative contextual patient representations and fine-tuned on labeled patient data for TEE. We present a new sequence encoding approach for longitudinal patient data embedding both structure and time. Evaluated on four downstream TEE tasks, CURE outperforms the state-of-the-art methods, marking a 7% increase in area under the precision-recall curve and an 8% rise in the influence-function-based precision of estimating heterogeneous effects. Validation with four randomized clinical trials confirms its efficacy in producing trial conclusions, highlighting CURE's capacity to supplement traditional clinical trials.
治疗效果估计(TEE)旨在确定治疗对重要结果的因果效应。当前基于机器学习的方法主要在针对特定治疗或结果的标记数据上进行训练,在标记数据有限的情况下可能并非最优。在本文中,我们提出了一种新的预训练和微调框架CURE(因果治疗效果估计),用于从观察数据中进行TEE。CURE在大规模未标记患者数据上进行预训练,以学习具有代表性的上下文患者表示,并在标记患者数据上进行微调以进行TEE。我们提出了一种新的序列编码方法,用于对包含结构和时间的纵向患者数据进行嵌入。在四个下游TEE任务上进行评估时,CURE优于现有方法,精确率-召回率曲线下面积增加了7%,基于影响函数估计异质效应的精度提高了8%。通过四项随机临床试验进行验证,证实了其在得出试验结论方面的有效性,突出了CURE补充传统临床试验的能力。