探究不同糖尿病患者电子健康记录中的语言差异：自然语言处理分析

Examining Linguistic Differences in Electronic Health Records for Diverse Patients With Diabetes: Natural Language Processing Analysis.

作者信息

Bilotta Isabel, Tonidandel Scott, Liaw Winston R, King Eden, Carvajal Diana N, Taylor Ayana, Thamby Julie, Xiang Yang, Tao Cui, Hansen Michael

机构信息

Deutser, Houston, TX, United States.

Belk College of Business, University of North Carolina at Charlotte, Charlotte, NC, United States.

出版信息

JMIR Med Inform. 2024 May 23;12:e50428. doi: 10.2196/50428.

DOI:10.2196/50428

PMID:38787295

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11137426/

Abstract

BACKGROUND

Individuals from minoritized racial and ethnic backgrounds experience pernicious and pervasive health disparities that have emerged, in part, from clinician bias.

OBJECTIVE

We used a natural language processing approach to examine whether linguistic markers in electronic health record (EHR) notes differ based on the race and ethnicity of the patient. To validate this methodological approach, we also assessed the extent to which clinicians perceive linguistic markers to be indicative of bias.

METHODS

In this cross-sectional study, we extracted EHR notes for patients who were aged 18 years or older; had more than 5 years of diabetes diagnosis codes; and received care between 2006 and 2014 from family physicians, general internists, or endocrinologists practicing in an urban, academic network of clinics. The race and ethnicity of patients were defined as White non-Hispanic, Black non-Hispanic, or Hispanic or Latino. We hypothesized that Sentiment Analysis and Social Cognition Engine (SEANCE) components (ie, negative adjectives, positive adjectives, joy words, fear and disgust words, politics words, respect words, trust verbs, and well-being words) and mean word count would be indicators of bias if racial differences emerged. We performed linear mixed effects analyses to examine the relationship between the outcomes of interest (the SEANCE components and word count) and patient race and ethnicity, controlling for patient age. To validate this approach, we asked clinicians to indicate the extent to which they thought variation in the use of SEANCE language domains for different racial and ethnic groups was reflective of bias in EHR notes.

RESULTS

We examined EHR notes (n=12,905) of Black non-Hispanic, White non-Hispanic, and Hispanic or Latino patients (n=1562), who were seen by 281 physicians. A total of 27 clinicians participated in the validation study. In terms of bias, participants rated negative adjectives as 8.63 (SD 2.06), fear and disgust words as 8.11 (SD 2.15), and positive adjectives as 7.93 (SD 2.46) on a scale of 1 to 10, with 10 being extremely indicative of bias. Notes for Black non-Hispanic patients contained significantly more negative adjectives (coefficient 0.07, SE 0.02) and significantly more fear and disgust words (coefficient 0.007, SE 0.002) than those for White non-Hispanic patients. The notes for Hispanic or Latino patients included significantly fewer positive adjectives (coefficient -0.02, SE 0.007), trust verbs (coefficient -0.009, SE 0.004), and joy words (coefficient -0.03, SE 0.01) than those for White non-Hispanic patients.

CONCLUSIONS

This approach may enable physicians and researchers to identify and mitigate bias in medical interactions, with the goal of reducing health disparities stemming from bias.

摘要

背景

来自少数族裔和种族背景的个体经历着有害且普遍存在的健康差距，部分原因是临床医生的偏见。

目的

我们采用自然语言处理方法，研究电子健康记录（EHR）笔记中的语言标记是否因患者的种族和民族而异。为验证这种方法，我们还评估了临床医生认为语言标记可表明偏见的程度。

方法

在这项横断面研究中，我们提取了年龄在18岁及以上、有超过5年糖尿病诊断代码且在2006年至2014年期间在城市学术诊所网络中执业的家庭医生、普通内科医生或内分泌科医生处接受治疗的患者的EHR笔记。患者的种族和民族被定义为非西班牙裔白人、非西班牙裔黑人或西班牙裔或拉丁裔。我们假设，如果出现种族差异，情感分析和社会认知引擎（SEANCE）组件（即负面形容词、正面形容词、喜悦词汇、恐惧和厌恶词汇、政治词汇、尊重词汇、信任动词和幸福词汇）以及平均单词数将是偏见的指标。我们进行了线性混合效应分析，以检查感兴趣的结果（SEANCE组件和单词数）与患者种族和民族之间的关系，并控制患者年龄。为验证这种方法，我们要求临床医生指出他们认为不同种族和民族群体在使用SEANCE语言领域方面的差异在多大程度上反映了EHR笔记中的偏见。

结果

我们检查了281名医生诊治的非西班牙裔黑人、非西班牙裔白人以及西班牙裔或拉丁裔患者（n = 1562）的EHR笔记（n = 12905）。共有27名临床医生参与了验证研究。在偏见方面，参与者在1至10的量表上对负面形容词的评分是8.63（标准差2.06），对恐惧和厌恶词汇的评分是8.11（标准差2.15），对正面形容词的评分是7.93（标准差2.46），10表示极能表明偏见。与非西班牙裔白人患者的笔记相比，非西班牙裔黑人患者的笔记包含显著更多的负面形容词（系数0.07，标准误0.02）和显著更多的恐惧和厌恶词汇（系数0.007，标准误0.002）。与非西班牙裔白人患者的笔记相比，西班牙裔或拉丁裔患者的笔记包含显著更少的正面形容词（系数 -0.02，标准误0.007）、信任动词（系数 -0.009，标准误0.004）和喜悦词汇（系数 -0.03，标准误0.01）。

结论

这种方法可能使医生和研究人员能够识别并减轻医疗互动中的偏见，以减少因偏见导致的健康差距。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

探究不同糖尿病患者电子健康记录中的语言差异：自然语言处理分析

Examining Linguistic Differences in Electronic Health Records for Diverse Patients With Diabetes: Natural Language Processing Analysis.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

探究不同糖尿病患者电子健康记录中的语言差异：自然语言处理分析

Examining Linguistic Differences in Electronic Health Records for Diverse Patients With Diabetes: Natural Language Processing Analysis.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献