Lee Hyun Gi, Sholle Evan, Beecy Ashley, Al'Aref Subhi, Peng Yifan
Department of Population Health Sciences, Weill Cornell Medicine.
Information Technologies and Services, Weill Cornell Medicine.
Proc Conf. 2021 Jun;2021:4533-4538. doi: 10.18653/v1/2021.naacl-main.358.
Utilizing clinical texts in survival analysis is difficult because they are largely unstructured. Current automatic extraction models fail to capture textual information comprehensively since their labels are limited in scope. Furthermore, they typically require a large amount of data and high-quality expert annotations for training. In this work, we present a novel method of using BERT-based hidden layer representations of clinical texts as covariates for proportional hazards models to predict patient survival outcomes. We show that hidden layers yield notably more accurate predictions than predefined features, outperforming the previous baseline model by 5.7% on average across C-index and time-dependent AUC. We make our work publicly available at https://github.com/bionlplab/heart_failure_mortality.
在生存分析中使用临床文本很困难,因为它们大多是非结构化的。当前的自动提取模型无法全面捕捉文本信息,因为其标签范围有限。此外,它们通常需要大量数据和高质量的专家注释来进行训练。在这项工作中,我们提出了一种新颖的方法,即使用基于BERT的临床文本隐藏层表示作为比例风险模型的协变量,以预测患者的生存结果。我们表明,隐藏层产生的预测比预定义特征更准确,在C指数和时间依赖性AUC方面平均比之前的基线模型高出5.7%。我们将我们的工作公开在https://github.com/bionlplab/heart_failure_mortality上。