Fu Tianfan, Hoang Trong Nghia, Xiao Cao, Sun Jimeng
Department of Computational Science and Engineering, Georgia Institute of Technology.
MIT-IBM Watson AI Lab, IBM Research.
IJCAI (U S). 2019 Aug;2019:5857-5863. doi: 10.24963/ijcai.2019/812.
Predictive phenotyping is about accurately predicting what phenotypes will occur in the next clinical visit based on longitudinal Electronic Health Record (EHR) data. While deep learning (DL) models have recently demonstrated strong performance in predictive phenotyping, they require access to a large amount of labeled data, which are expensive to acquire. To address this label-insufficient challenge, we propose a deep dictionary learning framework (DDL) for phenotyping, which utilizes unlabeled data as a complementary source of information to generate a better, more succinct data representation. Our empirical evaluations on multiple EHR datasets demonstrated that DDL outperforms the existing predictive phenotyping methods on a wide variety of clinical tasks that require patient phenotyping. The results also show that unlabeled data can be used to generate better data representation that helps improve DDL's phenotyping performance over existing methods that only uses labeled data.
预测性表型分析旨在基于纵向电子健康记录(EHR)数据准确预测下次临床就诊时会出现哪些表型。虽然深度学习(DL)模型最近在预测性表型分析中表现出强大性能,但它们需要访问大量标记数据,而获取这些数据成本高昂。为应对这种标记数据不足的挑战,我们提出了一种用于表型分析的深度字典学习框架(DDL),该框架将未标记数据用作补充信息源,以生成更好、更简洁的数据表示。我们在多个EHR数据集上的实证评估表明,在各种需要患者表型分析的临床任务中,DDL优于现有的预测性表型分析方法。结果还表明,未标记数据可用于生成更好的数据表示,这有助于提高DDL相对于仅使用标记数据的现有方法的表型分析性能。