Poulain Raphael, Gupta Mehak, Foraker Randi, Beheshti Rahmatollah
University of Delaware.
Washington University in St. Louis.
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2021 Dec;2021:726-731. doi: 10.1109/bibm52615.2021.9669441. Epub 2022 Jan 14.
Machine learning algorithms have been widely used to capture the static and temporal patterns within electronic health records (EHRs). While many studies focus on the (primary) prevention of diseases, primordial prevention (preventing the factors that are known to increase the risk of a disease occurring) is still widely under-investigated. In this study, we propose a multi-target regression model leveraging transformers to learn the bidirectional representations of EHR data and predict the future values of 11 major modifiable risk factors of cardiovascular disease (CVD). Inspired by the proven results of pre-training in natural language processing studies, we apply the same principles on EHR data, dividing the training of our model into two phases: pre-training and fine-tuning. We use the fine-tuned transformer model in a "multi-target regression" theme. Following this theme, we combine the 11 disjoint prediction tasks by adding shared and target-specific layers to the model and jointly train the entire model. We evaluate the performance of our proposed method on a large publicly available EHR dataset. Through various experiments, we demonstrate that the proposed method obtains a significant improvement (12.6% MAE on average across all 11 different outputs) over the baselines.
机器学习算法已被广泛用于捕捉电子健康记录(EHR)中的静态和时间模式。虽然许多研究专注于疾病的(一级)预防,但原级预防(预防已知会增加疾病发生风险的因素)仍未得到充分研究。在本研究中,我们提出了一种多目标回归模型,该模型利用变压器学习EHR数据的双向表示,并预测心血管疾病(CVD)11种主要可改变风险因素的未来值。受自然语言处理研究中预训练已证实的结果启发,我们将相同的原理应用于EHR数据,将模型训练分为两个阶段:预训练和微调。我们在“多目标回归”主题中使用微调后的变压器模型。按照这个主题,我们通过向模型添加共享层和特定于目标的层来组合11个不相关的预测任务,并联合训练整个模型。我们在一个大型公开可用的EHR数据集上评估我们提出的方法的性能。通过各种实验,我们证明所提出的方法相对于基线有显著改进(在所有11个不同输出上平均MAE提高12.6%)。