Linfield Gaia H, Patel Shyam, Ko Hee Joo, Lacar Benjamin, Gottlieb Laura M, Adler-Milstein Julia, Singh Nina V, Pantell Matthew S, De Marchis Emilia H
School of Medicine, University of California, San Francisco, CA, USA.
Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Berkeley Institute for Data Science, University of California, Berkeley.
Health Informatics J. 2023 Jul-Sep;29(3):14604582231200300. doi: 10.1177/14604582231200300.
To evaluate how and from where social risk data are extracted from EHRs for research purposes, and how observed differences may impact study generalizability. Systematic scoping review of peer-reviewed literature that used patient-level EHR data to assess 1 ± 6 social risk domains: housing, transportation, food, utilities, safety, social support/isolation. 111/9022 identified articles met inclusion criteria. By domain, social support/isolation was most often included ( = 68/111), predominantly defined by marital/partner status ( = 48/68) and extracted from structured sociodemographic data ( = 45/48). Housing risk was defined primarily by homelessness ( = 39/49). Structured housing data was extracted most from billing codes and screening tools ( = 15/30, 13/30, respectively). Across domains, data were predominantly sourced from structured fields ( = 89/111) versus unstructured free text ( = 32/111). We identified wide variability in how social domains are defined and extracted from EHRs for research. More consistency, particularly in how domains are operationalized, would enable greater insights across studies.
为评估如何以及从何处从电子健康记录(EHRs)中提取社会风险数据用于研究目的,以及观察到的差异可能如何影响研究的可推广性。对同行评审文献进行系统的范围综述,这些文献使用患者层面的EHR数据来评估1±6个社会风险领域:住房、交通、食品、公用事业、安全、社会支持/孤立。111/9022篇已识别文章符合纳入标准。按领域划分,社会支持/孤立最常被纳入(=68/111),主要由婚姻/伴侣状况定义(=48/68),并从结构化的社会人口数据中提取(=45/48)。住房风险主要由无家可归定义(=39/49)。结构化住房数据大多从计费代码和筛查工具中提取(分别为=15/30,13/30)。在各个领域,数据主要来自结构化字段(=89/111),而非非结构化的自由文本(=32/111)。我们发现,在如何从EHRs中定义和提取社会领域用于研究方面存在很大差异。更高的一致性,特别是在领域如何操作化方面,将有助于在各项研究中获得更深入的见解。