服务不足的人群中缺失种族民族数据与那些有结构化种族/民族文档记录的人群有显著差异。

Underserved populations with missing race ethnicity data differ significantly from those with structured race/ethnicity documentation.

机构信息

Information Technologies & Services Department, Weill Cornell Medicine, New York, New York, USA.

Department of Medicine, Weill Cornell Medicine, New York, New York, USA.

出版信息

J Am Med Inform Assoc. 2019 Aug 1;26(8-9):722-729. doi: 10.1093/jamia/ocz040.

DOI:10.1093/jamia/ocz040

PMID:31329882

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6696506/

Abstract

OBJECTIVE

We aimed to address deficiencies in structured electronic health record (EHR) data for race and ethnicity by identifying black and Hispanic patients from unstructured clinical notes and assessing differences between patients with or without structured race/ethnicity data.

MATERIALS AND METHODS

Using EHR notes for 16 665 patients with encounters at a primary care practice, we developed rule-based natural language processing (NLP) algorithms to classify patients as black/Hispanic. We evaluated performance of the method against an annotated gold standard, compared race and ethnicity between NLP-derived and structured EHR data, and compared characteristics of patients identified as black or Hispanic using only NLP vs patients identified as such only in structured EHR data.

RESULTS

For the sample of 16 665 patients, NLP identified 948 additional patients as black, a 26%increase, and 665 additional patients as Hispanic, a 20% increase. Compared with the patients identified as black or Hispanic in structured EHR data, patients identified as black or Hispanic via NLP only were older, more likely to be male, less likely to have commercial insurance, and more likely to have higher comorbidity.

DISCUSSION

Structured EHR data for race and ethnicity are subject to data quality issues. Supplementing structured EHR race data with NLP-derived race and ethnicity may allow researchers to better assess the demographic makeup of populations and draw more accurate conclusions about intergroup differences in health outcomes.

CONCLUSIONS

Black or Hispanic patients who are not documented as such in structured EHR race/ethnicity fields differ significantly from those who are. Relatively simple NLP can help address this limitation.

摘要

目的

通过从非结构化临床记录中识别黑人和西班牙裔患者，并评估有或没有结构化种族/民族数据患者之间的差异，来解决种族和民族的结构化电子健康记录（EHR）数据中的缺陷。

材料与方法

我们使用了一家基层医疗机构的 16665 名患者就诊时的 EHR 记录，开发了基于规则的自然语言处理（NLP）算法，将患者分类为黑人/西班牙裔。我们评估了该方法的性能与注释黄金标准的一致性，比较了 NLP 衍生数据和结构化 EHR 数据中的种族和民族，比较了仅使用 NLP 识别为黑人或西班牙裔的患者与仅在结构化 EHR 数据中识别为黑人或西班牙裔的患者的特征。

结果

在 16665 名患者的样本中，NLP 额外识别出 948 名黑人患者，增加了 26%，额外识别出 665 名西班牙裔患者，增加了 20%。与在结构化 EHR 数据中识别为黑人或西班牙裔的患者相比，仅通过 NLP 识别为黑人或西班牙裔的患者年龄更大，更可能是男性，更可能没有商业保险，且更可能患有更高的合并症。

讨论

种族和民族的结构化 EHR 数据存在数据质量问题。使用 NLP 衍生的种族和民族数据补充结构化 EHR 种族数据，可以使研究人员更好地评估人群的人口构成，并更准确地得出关于健康结果的群体间差异的结论。

结论

未在结构化 EHR 种族/民族字段中记录为黑人或西班牙裔的患者与记录为黑人或西班牙裔的患者有显著差异。相对简单的 NLP 可以帮助解决这一限制。

相似文献

Underserved populations with missing race ethnicity data differ significantly from those with structured race/ethnicity documentation.服务不足的人群中缺失种族民族数据与那些有结构化种族/民族文档记录的人群有显著差异。

J Am Med Inform Assoc. 2019 Aug 1;26(8-9):722-729. doi: 10.1093/jamia/ocz040.

Discrepancies in Race and Ethnicity in the Electronic Health Record Compared to Self-report.电子健康记录中的种族和民族差异与自我报告相比。

J Racial Ethn Health Disparities. 2023 Dec;10(6):2670-2675. doi: 10.1007/s40615-022-01445-w. Epub 2022 Nov 23.

Augmented intelligence with natural language processing applied to electronic health records for identifying patients with non-alcoholic fatty liver disease at risk for disease progression.应用自然语言处理的增强型人工智能用于电子健康记录，以识别非酒精性脂肪性肝病患者中疾病进展风险较高的患者。

Int J Med Inform. 2019 Sep;129:334-341. doi: 10.1016/j.ijmedinf.2019.06.028. Epub 2019 Jul 6.

Association of Early Childhood Wheeze and Asthma Diagnosis Documentation by Race, Ethnicity, and Language in Children.儿童种族、民族和语言与幼儿喘息和哮喘诊断记录的关联。

J Am Board Fam Med. 2024 Jan 5;36(6):1038-1042. doi: 10.3122/jabfm.2023.230115R1.

Race and Ethnicity and Clinician Linguistic Expressions of Doubt in Hospital Admission Notes.种族和民族与住院病历中临床医生怀疑态度的表达。

JAMA Netw Open. 2024 Oct 1;7(10):e2438550. doi: 10.1001/jamanetworkopen.2024.38550.

Use of Natural Language Processing of Patient-Initiated Electronic Health Record Messages to Identify Patients With COVID-19 Infection.利用自然语言处理技术对患者发起的电子健康记录消息进行分析，以识别 COVID-19 感染患者。

JAMA Netw Open. 2023 Jul 3;6(7):e2322299. doi: 10.1001/jamanetworkopen.2023.22299.

Association of Disparities in Family History and Family Cancer History in the Electronic Health Record With Sex, Race, Hispanic or Latino Ethnicity, and Language Preference in 2 Large US Health Care Systems.电子健康记录中家族病史和家族癌症病史的差异与性别、种族、西班牙裔或拉丁裔以及在 2 个大型美国医疗保健系统中的语言偏好的关联。

JAMA Netw Open. 2022 Oct 3;5(10):e2234574. doi: 10.1001/jamanetworkopen.2022.34574.

Assessing the accuracy of electronic health record gender identity and REaL data at an academic medical center.评估学术医疗中心电子健康记录中的性别认同和 REaL 数据的准确性。

BMC Health Serv Res. 2023 Aug 22;23(1):884. doi: 10.1186/s12913-023-09825-6.

Toward representative genomic research: the children's rare disease cohorts experience.迈向具有代表性的基因组研究：儿童罕见病队列研究经验

Ther Adv Rare Dis. 2023 Aug 22;4:26330040231181406. doi: 10.1177/26330040231181406. eCollection 2023 Jan-Dec.

Differences in Health Professionals' Engagement With Electronic Health Records Based on Inpatient Race and Ethnicity.基于住院患者种族和民族的不同，卫生专业人员与电子健康记录的互动情况存在差异。

JAMA Netw Open. 2023 Oct 2;6(10):e2336383. doi: 10.1001/jamanetworkopen.2023.36383.

引用本文的文献

Comparing Multiple Imputation Methods to Address Missing Patient Demographics in Immunization Information Systems: Retrospective Cohort Study.比较多种多重填补方法以解决免疫接种信息系统中患者人口统计学数据缺失问题：回顾性队列研究。

JMIR Public Health Surveill. 2025 Aug 26;11:e73916. doi: 10.2196/73916.

Race and Ethnicity Data in the Electronic Health Records: New Insights Through Comparison with American Community Survey Microdata.电子健康记录中的种族和族裔数据：通过与美国社区调查微观数据比较获得的新见解

J Racial Ethn Health Disparities. 2025 Apr 22. doi: 10.1007/s40615-025-02435-4.

Improving Clinical Documentation with Artificial Intelligence: A Systematic Review.利用人工智能改善临床文档记录：一项系统综述。

Perspect Health Inf Manag. 2024 Jun 1;21(2):1d. eCollection 2024 Summer-Fall.

Medically Tailored Grocery Deliveries to Improve Food Security and Hypertension in Underserved Groups: A Student-Run Pilot Randomized Controlled Trial.为改善弱势群体的粮食安全和高血压状况而提供的定制化医疗杂货配送服务：一项学生主导的试点随机对照试验。

Healthcare (Basel). 2025 Jan 27;13(3):253. doi: 10.3390/healthcare13030253.

Which curriculum components do medical students find most helpful for evaluating AI outputs?医学生认为哪些课程组成部分对评估人工智能输出最有帮助？

BMC Med Educ. 2025 Feb 6;25(1):195. doi: 10.1186/s12909-025-06735-5.

Contextualized race and ethnicity annotations for clinical text from MIMIC-III.针对MIMIC-III临床文本的情境化种族和族裔注释。

Sci Data. 2024 Dec 5;11(1):1332. doi: 10.1038/s41597-024-04183-2.

A multi-state analysis on the effect of deprivation and race on PICU admission and mortality in children receiving Medicaid in United States (2007-2014).美国接受医疗补助的儿童在重症监护病房入院和死亡方面贫困和种族影响的多状态分析（2007-2014 年）。

BMC Pediatr. 2024 Sep 5;24(1):565. doi: 10.1186/s12887-024-05031-3.

Factors Associated with Missing Sociodemographic Data in the IRIS® (Intelligent Research in Sight) Registry.IRIS®（智能视觉研究）注册中心中与社会人口统计学数据缺失相关的因素。

Ophthalmol Sci. 2024 Apr 30;4(6):100542. doi: 10.1016/j.xops.2024.100542. eCollection 2024 Nov-Dec.

Identifying stigmatizing language in clinical documentation: A scoping review of emerging literature.识别临床文档中的污名化语言：新兴文献的范围综述。

PLoS One. 2024 Jun 28;19(6):e0303653. doi: 10.1371/journal.pone.0303653. eCollection 2024.

Auditing Learned Associations in Deep Learning Approaches to Extract Race and Ethnicity from Clinical Text.从临床文本中提取种族和民族的深度学习方法中的学习关联的审核。

AMIA Annu Symp Proc. 2024 Jan 11;2023:289-298. eCollection 2023.

本文引用的文献

Ascertaining Depression Severity by Extracting Patient Health Questionnaire-9 (PHQ-9) Scores from Clinical Notes.通过从临床记录中提取患者健康问卷-9（PHQ-9）评分来确定抑郁严重程度。

AMIA Annu Symp Proc. 2018 Dec 5;2018:147-156. eCollection 2018.

From Sour Grapes to Low-Hanging Fruit: A Case Study Demonstrating a Practical Strategy for Natural Language Processing Portability.从酸葡萄到低垂的果实：一个展示自然语言处理可移植性实用策略的案例研究

AMIA Jt Summits Transl Sci Proc. 2018 May 18;2017:104-112. eCollection 2018.

Secondary Use of Patients' Electronic Records (SUPER): An Approach for Meeting Specific Data Needs of Clinical and Translational Researchers.患者电子记录的二次利用（SUPER）：一种满足临床和转化研究人员特定数据需求的方法。

AMIA Annu Symp Proc. 2018 Apr 16;2017:1581-1588. eCollection 2017.

Unlocking echocardiogram measurements for heart disease research through natural language processing.通过自然语言处理解锁用于心脏病研究的超声心动图测量方法。

BMC Cardiovasc Disord. 2017 Jun 12;17(1):151. doi: 10.1186/s12872-017-0580-8.

Minorities Are Underrepresented in Clinical Trials of Pharmaceutical Agents for Cystic Fibrosis.少数民族在治疗囊性纤维化的药物临床试验中代表性不足。

Ann Am Thorac Soc. 2016 Oct;13(10):1721-1725. doi: 10.1513/AnnalsATS.201603-192BC.

Using natural language processing to identify problem usage of prescription opioids.使用自然语言处理来识别处方阿片类药物的问题使用情况。

Int J Med Inform. 2015 Dec;84(12):1057-64. doi: 10.1016/j.ijmedinf.2015.09.002. Epub 2015 Sep 25.

Accuracy of race, ethnicity, and language preference in an electronic health record.电子健康记录中种族、族裔和语言偏好的准确性。

J Gen Intern Med. 2015 Jun;30(6):719-23. doi: 10.1007/s11606-014-3102-8. Epub 2014 Dec 20.

Automated identification of patients with a diagnosis of binge eating disorder from narrative electronic health records.从电子病历中自动识别诊断为暴食症的患者。

J Am Med Inform Assoc. 2014 Feb;21(e1):e163-8. doi: 10.1136/amiajnl-2013-001859. Epub 2013 Nov 7.

Tracking health disparities through natural-language processing.通过自然语言处理追踪健康差距。

Am J Public Health. 2013 Mar;103(3):448-9. doi: 10.2105/AJPH.2012.300943. Epub 2013 Jan 17.

Longitudinal analysis of pain in patients with metastatic prostate cancer using natural language processing of medical record text.利用医疗记录文本的自然语言处理技术对转移性前列腺癌患者的疼痛进行纵向分析。

J Am Med Inform Assoc. 2013 Sep-Oct;20(5):898-905. doi: 10.1136/amiajnl-2012-001076. Epub 2012 Nov 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验