Walker Andrew, Thorne Annie, Das Sudeshna, Love Jennifer, Cooper Hannah L F, Livingston Melvin, Sarker Abeed
Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA 30322, United States.
Department of Infectious Disease, Children's Healthcare of Atlanta, Atlanta, GA 30329, United States.
J Am Med Inform Assoc. 2025 Feb 1;32(2):365-374. doi: 10.1093/jamia/ocae310.
To detect and classify features of stigmatizing and biased language in intensive care electronic health records (EHRs) using natural language processing techniques.
We first created a lexicon and regular expression lists from literature-driven stem words for linguistic features of stigmatizing patient labels, doubt markers, and scare quotes within EHRs. The lexicon was further extended using Word2Vec and GPT 3.5, and refined through human evaluation. These lexicons were used to search for matches across 18 million sentences from the de-identified Medical Information Mart for Intensive Care-III (MIMIC-III) dataset. For each linguistic bias feature, 1000 sentence matches were sampled, labeled by expert clinical and public health annotators, and used to supervised learning classifiers.
Lexicon development from expanded literature stem-word lists resulted in a doubt marker lexicon containing 58 expressions, and a stigmatizing labels lexicon containing 127 expressions. Classifiers for doubt markers and stigmatizing labels had the highest performance, with macro F1-scores of 0.84 and 0.79, positive-label recall and precision values ranging from 0.71 to 0.86, and accuracies aligning closely with human annotator agreement (0.87).
This study demonstrated the feasibility of supervised classifiers in automatically identifying stigmatizing labels and doubt markers in medical text and identified trends in stigmatizing language use in an EHR setting. Additional labeled data may help improve lower scare quote model performance.
Classifiers developed in this study showed high model performance and can be applied to identify patterns and target interventions to reduce stigmatizing labels and doubt markers in healthcare systems.
使用自然语言处理技术检测重症监护电子健康记录(EHR)中污名化和偏见性语言的特征并进行分类。
我们首先从文献驱动的词干词创建了一个词典和正则表达式列表,用于EHR中污名化患者标签、怀疑标记和 scare quotes 的语言特征。该词典通过Word2Vec和GPT 3.5进一步扩展,并通过人工评估进行完善。这些词典用于在去识别化的重症监护医学信息集市-III(MIMIC-III)数据集中的1800万个句子中搜索匹配项。对于每个语言偏见特征,抽取1000个句子匹配项,由临床专家和公共卫生注释者进行标注,并用于监督学习分类器。
从扩展的文献词干词列表开发的词典产生了一个包含58个表达式的怀疑标记词典和一个包含127个表达式的污名化标签词典。怀疑标记和污名化标签的分类器性能最高,宏F1分数分别为0.84和0.79,阳性标签召回率和精确率值在0.71至0.86之间,准确率与人工注释者的一致性密切相关(0.87)。
本研究证明了监督分类器在自动识别医学文本中污名化标签和怀疑标记方面的可行性,并确定了EHR环境中污名化语言使用的趋势。额外的标注数据可能有助于提高较低的 scare quote 模型性能。
本研究开发的分类器显示出较高的模型性能,可用于识别模式并针对干预措施,以减少医疗保健系统中的污名化标签和怀疑标记。