Suppr超能文献

服务不足的人群中缺失种族民族数据与那些有结构化种族/民族文档记录的人群有显著差异。

Underserved populations with missing race ethnicity data differ significantly from those with structured race/ethnicity documentation.

机构信息

Information Technologies & Services Department, Weill Cornell Medicine, New York, New York, USA.

Department of Medicine, Weill Cornell Medicine, New York, New York, USA.

出版信息

J Am Med Inform Assoc. 2019 Aug 1;26(8-9):722-729. doi: 10.1093/jamia/ocz040.

Abstract

OBJECTIVE

We aimed to address deficiencies in structured electronic health record (EHR) data for race and ethnicity by identifying black and Hispanic patients from unstructured clinical notes and assessing differences between patients with or without structured race/ethnicity data.

MATERIALS AND METHODS

Using EHR notes for 16 665 patients with encounters at a primary care practice, we developed rule-based natural language processing (NLP) algorithms to classify patients as black/Hispanic. We evaluated performance of the method against an annotated gold standard, compared race and ethnicity between NLP-derived and structured EHR data, and compared characteristics of patients identified as black or Hispanic using only NLP vs patients identified as such only in structured EHR data.

RESULTS

For the sample of 16 665 patients, NLP identified 948 additional patients as black, a 26%increase, and 665 additional patients as Hispanic, a 20% increase. Compared with the patients identified as black or Hispanic in structured EHR data, patients identified as black or Hispanic via NLP only were older, more likely to be male, less likely to have commercial insurance, and more likely to have higher comorbidity.

DISCUSSION

Structured EHR data for race and ethnicity are subject to data quality issues. Supplementing structured EHR race data with NLP-derived race and ethnicity may allow researchers to better assess the demographic makeup of populations and draw more accurate conclusions about intergroup differences in health outcomes.

CONCLUSIONS

Black or Hispanic patients who are not documented as such in structured EHR race/ethnicity fields differ significantly from those who are. Relatively simple NLP can help address this limitation.

摘要

目的

通过从非结构化临床记录中识别黑人和西班牙裔患者,并评估有或没有结构化种族/民族数据患者之间的差异,来解决种族和民族的结构化电子健康记录(EHR)数据中的缺陷。

材料与方法

我们使用了一家基层医疗机构的 16665 名患者就诊时的 EHR 记录,开发了基于规则的自然语言处理(NLP)算法,将患者分类为黑人/西班牙裔。我们评估了该方法的性能与注释黄金标准的一致性,比较了 NLP 衍生数据和结构化 EHR 数据中的种族和民族,比较了仅使用 NLP 识别为黑人或西班牙裔的患者与仅在结构化 EHR 数据中识别为黑人或西班牙裔的患者的特征。

结果

在 16665 名患者的样本中,NLP 额外识别出 948 名黑人患者,增加了 26%,额外识别出 665 名西班牙裔患者,增加了 20%。与在结构化 EHR 数据中识别为黑人或西班牙裔的患者相比,仅通过 NLP 识别为黑人或西班牙裔的患者年龄更大,更可能是男性,更可能没有商业保险,且更可能患有更高的合并症。

讨论

种族和民族的结构化 EHR 数据存在数据质量问题。使用 NLP 衍生的种族和民族数据补充结构化 EHR 种族数据,可以使研究人员更好地评估人群的人口构成,并更准确地得出关于健康结果的群体间差异的结论。

结论

未在结构化 EHR 种族/民族字段中记录为黑人或西班牙裔的患者与记录为黑人或西班牙裔的患者有显著差异。相对简单的 NLP 可以帮助解决这一限制。

相似文献

2
Discrepancies in Race and Ethnicity in the Electronic Health Record Compared to Self-report.电子健康记录中的种族和民族差异与自我报告相比。
J Racial Ethn Health Disparities. 2023 Dec;10(6):2670-2675. doi: 10.1007/s40615-022-01445-w. Epub 2022 Nov 23.
9
Toward representative genomic research: the children's rare disease cohorts experience.迈向具有代表性的基因组研究:儿童罕见病队列研究经验
Ther Adv Rare Dis. 2023 Aug 22;4:26330040231181406. doi: 10.1177/26330040231181406. eCollection 2023 Jan-Dec.

引用本文的文献

本文引用的文献

9
Tracking health disparities through natural-language processing.通过自然语言处理追踪健康差距。
Am J Public Health. 2013 Mar;103(3):448-9. doi: 10.2105/AJPH.2012.300943. Epub 2013 Jan 17.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验