Zhang Jiancheng, Xu Yonghui, Ye Bicui, Zhao Yibowen, Sun Xiaofang, Meng Qi, Zhang Yang, Cui Lizhen
Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Jinan, China.
School of Software, Shandong University, Jinan, China.
Health Inf Sci Syst. 2023 Nov 14;11(1):53. doi: 10.1007/s13755-023-00256-5. eCollection 2023 Dec.
Patient representation learning aims to encode meaningful information about the patient's Electronic Health Records (EHR) in the form of a mathematical representation. Recent advances in deep learning have empowered Patient representation learning methods with greater representational power, allowing the learned representations to significantly improve the performance of disease prediction models. However, the inherent shortcomings of deep learning models, such as the need for massive amounts of labeled data and inexplicability, limit the performance of deep learning-based Patient representation learning methods to further improvements. In particular, learning robust patient representations is challenging when patient data is missing or insufficient. Although data augmentation techniques can tackle this deficiency, the complex data processing further weakens the inexplicability of patient representation learning models. To address the above challenges, this paper proposes an Explainable and Augmented Patient Representation Learning for disease prediction (EAPR). EAPR utilizes data augmentation controlled by confidence interval to enhance patient representation in the presence of limited patient data. Moreover, EAPR proposes to use two-stage gradient backpropagation to address the problem of unexplainable patient representation learning models due to the complex data enhancement process. The experimental results on real clinical data validate the effectiveness and explainability of the proposed approach.
患者表征学习旨在以数学表征的形式对患者电子健康记录(EHR)中的有意义信息进行编码。深度学习的最新进展赋予了患者表征学习方法更强的表征能力,使学习到的表征能够显著提高疾病预测模型的性能。然而,深度学习模型的固有缺点,如需要大量标记数据和难以解释性,限制了基于深度学习的患者表征学习方法性能的进一步提升。特别是,当患者数据缺失或不足时,学习鲁棒的患者表征具有挑战性。尽管数据增强技术可以解决这一缺陷,但复杂的数据处理进一步削弱了患者表征学习模型的可解释性。为了应对上述挑战,本文提出了一种用于疾病预测的可解释增强患者表征学习(EAPR)方法。EAPR利用由置信区间控制的数据增强,在患者数据有限的情况下增强患者表征。此外,EAPR提出使用两阶段梯度反向传播来解决由于复杂的数据增强过程导致的患者表征学习模型难以解释的问题。在真实临床数据上的实验结果验证了所提方法的有效性和可解释性。