Fanconi Claudio, van Buchem Marieke, Hernandez-Boussard Tina
Stanford University, Stanford, California, United States.
ETH Zürich, Zürich, Switzerland.
AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:138-147. eCollection 2023.
Clinical notes are an essential component of a health record. This paper evaluates how natural language processing (NLP) can be used to identify the risk of acute care use (ACU) in oncology patients, once chemotherapy starts. Risk prediction using structured health data (SHD) is now standard, but predictions using free-text formats are complex. This paper explores the use of free-text notes for the prediction of ACU in leu of SHD. Deep Learning models were compared to manually engineered language features. Results show that SHD models minimally outperform NLP models; an ℓ-penalised logistic regression with SHD achieved a C-statistic of 0.748 (95%-CI: 0.735, 0.762), while the same model with language features achieved 0.730 (95%-CI: 0.717, 0.745) and a transformer-based model achieved 0.702 (95%-CI: 0.688, 0.717). This paper shows how language models can be used in clinical applications and underlines how risk bias is different for diverse patient groups, even using only free-text data.
临床记录是健康档案的重要组成部分。本文评估了自然语言处理(NLP)如何用于识别肿瘤患者化疗开始后急性护理使用(ACU)的风险。使用结构化健康数据(SHD)进行风险预测现已成为标准做法,但使用自由文本格式进行预测则较为复杂。本文探讨了在没有SHD的情况下使用自由文本记录来预测ACU。将深度学习模型与人工设计的语言特征进行了比较。结果表明,SHD模型略优于NLP模型;使用SHD的ℓ-惩罚逻辑回归模型的C统计量为0.748(95%置信区间:0.735,0.762),而使用语言特征的相同模型的C统计量为0.730(95%置信区间:0.717,0.745),基于Transformer的模型的C统计量为0.702(95%置信区间:0.688,0.717)。本文展示了语言模型如何用于临床应用,并强调了即使仅使用自由文本数据,不同患者群体的风险偏差也有所不同。