Hsu Chao-Chun, Obermeyer Ziad, Tan Chenhao
University of Chicago, Chicago, IL, USA.
University of California, Berkeley, CA, USA.
Nat Commun. 2025 Jul 1;16(1):5791. doi: 10.1038/s41467-025-60865-4.
Clinical notes should capture important information from a physician-patient encounter, but they may also contain signals indicative of physician fatigue. Using data from 129,228 emergency department (ED) visits, we train a model to identify notes written by physicians who are likely to be tired: those who worked ED shifts on at least 5 of the prior 7 days. In a hold-out set, the model accurately identifies notes written by such high-workload physicians. It also flags notes written in other settings with high fatigue: overnight shifts and high patient volumes. When the model identifies signs of fatigue in a note, physician decision-making for that patient appears worse: yield of testing for heart attack is 19% lower with each standard deviation increase in model-predicted fatigue. A key feature of notes written by fatigued doctors is the predictability of the next word, given the preceding context. Perhaps unsurprisingly, because word prediction is the core of how large language models (LLMs) work, we find that predicted fatigue of LLM-written notes is 74% higher than that of physician-written ones, highlighting the possibility that LLMs may introduce distortions in generated text that are not yet fully understood.
临床记录应包含医患诊疗过程中的重要信息,但也可能包含医生疲劳的迹象。利用129228次急诊科就诊的数据,我们训练了一个模型,以识别可能疲劳的医生所写的记录:即那些在前7天中至少有5天在急诊科轮班工作的医生。在一个保留集中,该模型能够准确识别此类高工作量医生所写的记录。它还能标记出在其他高疲劳环境下所写的记录:夜班和高患者量情况。当模型在一份记录中识别出疲劳迹象时,该患者的医生决策似乎更差:随着模型预测的疲劳程度每增加一个标准差,心脏病发作检测的阳性率就会降低19%。疲劳医生所写记录的一个关键特征是,根据前文语境,下一个单词具有可预测性。也许不足为奇的是,由于单词预测是大语言模型(LLMs)工作的核心,我们发现大语言模型生成的记录的预测疲劳程度比医生手写的记录高74%,这凸显了大语言模型可能在生成文本中引入尚未完全理解的偏差的可能性。