Chen Aokun, Paredes Daniel, Yu Zehao, Lou Xiwei, Brunson Roberta, Thomas Jamie N, Martinez Kimberly A, Lucero Robert J, Magoc Tanja, Solberg Laurence M, Snigurska Urszula A, Ser Sarah E, Prosperi Mattia, Bian Jiang, Bjarnadottir Ragnhildur I, Wu Yonghui
Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA.
UF Health Shands Hospital, Gainesville, FL, USA.
Proc (IEEE Int Conf Healthc Inform). 2024 Jun;2024:305-311. doi: 10.1109/ichi61247.2024.00046. Epub 2024 Aug 22.
Delirium is an acute decline or fluctuation in attention, awareness, or other cognitive function that can lead to serious adverse outcomes. Despite the severe outcomes, delirium is frequently unrecognized and uncoded in patients' electronic health records (EHRs) due to its transient and diverse nature. Natural language processing (NLP), a key technology that extracts medical concepts from clinical narratives, has shown great potential in studies of delirium outcomes and symptoms. To assist in the diagnosis and phenotyping of delirium, we formed an expert panel to categorize diverse delirium symptoms, composed annotation guidelines, created a delirium corpus with diverse delirium symptoms, and developed NLP methods to extract delirium symptoms from clinical notes. We compared 5 state-of-the-art transformer models including 2 models (BERT and RoBERTa) from the general domain and 3 models (BERT_MIMIC, RoBERTa_MIMIC, and GatorTron) from the clinical domain. GatorTron achieved the best strict and lenient F1 scores of 0.8055 and 0.8759, respectively. We conducted an error analysis to identify challenges in annotating delirium symptoms and developing NLP systems. To the best of our knowledge, this is the first large language model-based delirium symptom extraction system. Our study lays the foundation for the future development of computable phenotypes and diagnosis methods for delirium.
谵妄是注意力、意识或其他认知功能的急性下降或波动,可导致严重不良后果。尽管后果严重,但由于其短暂性和多样性,谵妄在患者电子健康记录(EHRs)中常常未被识别和编码。自然语言处理(NLP)是一种从临床叙述中提取医学概念的关键技术,在谵妄结局和症状研究中显示出巨大潜力。为协助谵妄的诊断和表型分析,我们组建了一个专家小组,对各种谵妄症状进行分类,制定注释指南,创建了一个包含各种谵妄症状的谵妄语料库,并开发了从临床记录中提取谵妄症状的NLP方法。我们比较了5种最先进的Transformer模型,包括2种通用领域的模型(BERT和RoBERTa)和3种临床领域的模型(BERT_MIMIC、RoBERTa_MIMIC和GatorTron)。GatorTron分别取得了最佳的严格F1分数0.8055和宽松F1分数0.8759。我们进行了误差分析,以确定在注释谵妄症状和开发NLP系统方面存在的挑战。据我们所知,这是第一个基于大语言模型的谵妄症状提取系统。我们的研究为谵妄的可计算表型和诊断方法的未来发展奠定了基础。