CARE-SD：基于分类器的电子健康记录中识别医疗服务提供者污名化和怀疑标记标签的分析：模型开发与验证

CARE-SD: classifier-based analysis for recognizing provider stigmatizing and doubt marker labels in electronic health records: model development and validation.

作者信息

Walker Andrew, Thorne Annie, Das Sudeshna, Love Jennifer, Cooper Hannah L F, Livingston Melvin, Sarker Abeed

机构信息

Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA 30322, United States.

Department of Infectious Disease, Children's Healthcare of Atlanta, Atlanta, GA 30329, United States.

出版信息

J Am Med Inform Assoc. 2025 Feb 1;32(2):365-374. doi: 10.1093/jamia/ocae310.

DOI:10.1093/jamia/ocae310

PMID:39724920

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11756621/

Abstract

OBJECTIVE

To detect and classify features of stigmatizing and biased language in intensive care electronic health records (EHRs) using natural language processing techniques.

MATERIALS AND METHODS

We first created a lexicon and regular expression lists from literature-driven stem words for linguistic features of stigmatizing patient labels, doubt markers, and scare quotes within EHRs. The lexicon was further extended using Word2Vec and GPT 3.5, and refined through human evaluation. These lexicons were used to search for matches across 18 million sentences from the de-identified Medical Information Mart for Intensive Care-III (MIMIC-III) dataset. For each linguistic bias feature, 1000 sentence matches were sampled, labeled by expert clinical and public health annotators, and used to supervised learning classifiers.

RESULTS

Lexicon development from expanded literature stem-word lists resulted in a doubt marker lexicon containing 58 expressions, and a stigmatizing labels lexicon containing 127 expressions. Classifiers for doubt markers and stigmatizing labels had the highest performance, with macro F1-scores of 0.84 and 0.79, positive-label recall and precision values ranging from 0.71 to 0.86, and accuracies aligning closely with human annotator agreement (0.87).

DISCUSSION

This study demonstrated the feasibility of supervised classifiers in automatically identifying stigmatizing labels and doubt markers in medical text and identified trends in stigmatizing language use in an EHR setting. Additional labeled data may help improve lower scare quote model performance.

CONCLUSIONS

Classifiers developed in this study showed high model performance and can be applied to identify patterns and target interventions to reduce stigmatizing labels and doubt markers in healthcare systems.

摘要

目的

使用自然语言处理技术检测重症监护电子健康记录（EHR）中污名化和偏见性语言的特征并进行分类。

材料与方法

我们首先从文献驱动的词干词创建了一个词典和正则表达式列表，用于EHR中污名化患者标签、怀疑标记和 scare quotes 的语言特征。该词典通过Word2Vec和GPT 3.5进一步扩展，并通过人工评估进行完善。这些词典用于在去识别化的重症监护医学信息集市-III（MIMIC-III）数据集中的1800万个句子中搜索匹配项。对于每个语言偏见特征，抽取1000个句子匹配项，由临床专家和公共卫生注释者进行标注，并用于监督学习分类器。

结果

从扩展的文献词干词列表开发的词典产生了一个包含58个表达式的怀疑标记词典和一个包含127个表达式的污名化标签词典。怀疑标记和污名化标签的分类器性能最高，宏F1分数分别为0.84和0.79，阳性标签召回率和精确率值在0.71至0.86之间，准确率与人工注释者的一致性密切相关（0.87）。

讨论

本研究证明了监督分类器在自动识别医学文本中污名化标签和怀疑标记方面的可行性，并确定了EHR环境中污名化语言使用的趋势。额外的标注数据可能有助于提高较低的 scare quote 模型性能。

结论

本研究开发的分类器显示出较高的模型性能，可用于识别模式并针对干预措施，以减少医疗保健系统中的污名化标签和怀疑标记。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a654/11756621/d4b98a362823/ocae310f1.jpg

相似文献

CARE-SD: classifier-based analysis for recognizing provider stigmatizing and doubt marker labels in electronic health records: model development and validation.CARE-SD：基于分类器的电子健康记录中识别医疗服务提供者污名化和怀疑标记标签的分析：模型开发与验证

J Am Med Inform Assoc. 2025 Feb 1;32(2):365-374. doi: 10.1093/jamia/ocae310.

Identifying stigmatizing and positive/preferred language in obstetric clinical notes using natural language processing.使用自然语言处理识别产科临床记录中的污名化语言以及积极/偏好性语言。

J Am Med Inform Assoc. 2025 Feb 1;32(2):308-317. doi: 10.1093/jamia/ocae290.

Race and Ethnicity and Clinician Linguistic Expressions of Doubt in Hospital Admission Notes.种族和民族与住院病历中临床医生怀疑态度的表达。

JAMA Netw Open. 2024 Oct 1;7(10):e2438550. doi: 10.1001/jamanetworkopen.2024.38550.

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施：系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Machine Learning and Natural Language Processing in Mental Health: Systematic Review.机器学习和自然语言处理在心理健康中的应用：系统综述。

J Med Internet Res. 2021 May 4;23(5):e15708. doi: 10.2196/15708.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标：模型开发与评估研究

JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.

Magnetic resonance perfusion for differentiating low-grade from high-grade gliomas at first presentation.首次就诊时磁共振灌注成像用于鉴别低级别与高级别胶质瘤

Cochrane Database Syst Rev. 2018 Jan 22;1(1):CD011551. doi: 10.1002/14651858.CD011551.pub2.

Leveraging a foundation model zoo for cell similarity search in oncological microscopy across devices.利用基础模型库进行跨设备肿瘤显微镜检查中的细胞相似性搜索。

Front Oncol. 2025 Jun 18;15:1480384. doi: 10.3389/fonc.2025.1480384. eCollection 2025.

引用本文的文献

A Hand to Hold.一只可握的手。

J Gen Intern Med. 2025 Aug 27. doi: 10.1007/s11606-025-09678-1.

本文引用的文献

Using Natural Language Processing to Identify Stigmatizing Language in Labor and Birth Clinical Notes.使用自然语言处理技术识别分娩临床记录中的污名化语言

Matern Child Health J. 2024 Mar;28(3):578-586. doi: 10.1007/s10995-023-03857-4. Epub 2023 Dec 26.

Barriers to opioid use disorder treatment: A comparison of self-reported information from social media with barriers found in literature.阿片类使用障碍治疗障碍：社交媒体自我报告信息与文献中发现的障碍的比较。

Front Public Health. 2023 Apr 20;11:1141093. doi: 10.3389/fpubh.2023.1141093. eCollection 2023.

Examination of Stigmatizing Language in the Electronic Health Record.电子健康记录中的污名化语言研究。

JAMA Netw Open. 2022 Jan 4;5(1):e2144967. doi: 10.1001/jamanetworkopen.2021.44967.

Negative Patient Descriptors: Documenting Racial Bias In The Electronic Health Record.负面患者描述：电子健康记录中的种族偏见问题。

Health Aff (Millwood). 2022 Feb;41(2):203-211. doi: 10.1377/hlthaff.2021.01423. Epub 2022 Jan 19.

Quoting Patients in Clinical Notes: First, Do No Harm.在临床记录中引用患者的话：首先，不要造成伤害。

Ann Intern Med. 2021 Oct;174(10):1454-1455. doi: 10.7326/M21-2449. Epub 2021 Aug 17.

Physician Use of Stigmatizing Language in Patient Medical Records.医生在患者病历中使用污名化语言的情况。

JAMA Netw Open. 2021 Jul 1;4(7):e2117052. doi: 10.1001/jamanetworkopen.2021.17052.

Testimonial Injustice: Linguistic Bias in the Medical Records of Black Patients and Women.见证不公：黑人和女性患者病历中的语言偏见。

J Gen Intern Med. 2021 Jun;36(6):1708-1714. doi: 10.1007/s11606-021-06682-z. Epub 2021 Mar 22.

Healthcare in the new age of transparency.透明新时代的医疗保健。

Semin Dial. 2020 Nov;33(6):533-538. doi: 10.1111/sdi.12934. Epub 2020 Nov 19.

Digital Health Equity as a Necessity in the 21st Century Cures Act Era.数字健康公平是21世纪《治愈法案》时代的一项必要条件。

JAMA. 2020 Jun 16;323(23):2381-2382. doi: 10.1001/jama.2020.7858.

Health Care Providers' Negative Implicit Attitudes and Stereotypes of American Indians.医疗保健提供者对美洲印第安人的负面隐性态度和刻板印象。

J Racial Ethn Health Disparities. 2021 Feb;8(1):230-236. doi: 10.1007/s40615-020-00776-w. Epub 2020 May 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

CARE-SD：基于分类器的电子健康记录中识别医疗服务提供者污名化和怀疑标记标签的分析：模型开发与验证

CARE-SD: classifier-based analysis for recognizing provider stigmatizing and doubt marker labels in electronic health records: model development and validation.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSIONS

目的

材料与方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献