Suppr超能文献

改善定量大数据清理公平性的方案:来自对代表性不足和边缘化社区电子健康记录纵向分析的经验教训。

Protocol for improving equity in quantitative big data cleaning: lessons from longitudinal analysis of electronic health records from underrepresented and marginalized communities.

作者信息

Buchanan Zeruiah V, Hopkins Scarlett E, Boyer Bert B, Fohner Alison E

机构信息

Department of Epidemiology, University of Washington, Seattle, WA, United States.

Robert Wood Johnson Health Policy Scholars Program, Johns Hopkins University, Baltimore, MD, United States.

出版信息

Int J Epidemiol. 2025 Feb 16;54(2). doi: 10.1093/ije/dyaf013.

Abstract

BACKGROUND

Large biomedical datasets, including electronic health records (EHRs), are a significant source of epidemiologic data. To prepare an EHR for analysis, there are several data-cleaning approaches; here, we focus on data filtering. Common data-filtering methods employ rules that rely on data from socially constructed dominant populations but are inappropriate for marginalized populations, leading to the loss of valuable data and neglect of underrepresented communities. We propose a novel method based on a phenomenological framework that is more equitable and inclusive, leading to culturally responsive research and discoveries.

METHODS

EHRs from the Yukon-Kuskokwim Health Corporation (YKHC) containing 1 262 035 records from 12 402 unique individuals from 2002 to 2012 were cleaned by using the proposed phenomenological (individual) and common (cohort) data-filtering approach. Within the phenomenological framework, we (i) excluded values that were undeniably biologically impossible for any population, (ii) excludes values that fell outside three standard deviations from the mean value for each individual person, and (iii) used two forms of imputation methods for stable quantitative and qualitative values at the individual level when data were missing.

RESULTS

Compared with common data-filtering practices, the phenomenological approach retained more observations, participants, and a range of outcomes, allowing a truer representation of the priority population. In sensitivity analyses comparing the results of the raw data, the common approach implemented, and the phenomenological approach applied, we found that the phenomenological approach did not compromise the integrity of the results.

CONCLUSION

The phenomenological approach to filtering big data presents an opportunity to better advocate for marginalized communities even when using large datasets that require automated rules for data filtering. Our method may empower researchers who are partnering with communities to embrace large datasets without compromising their commitment to community benefit and respect.

摘要

背景

包括电子健康记录(EHR)在内的大型生物医学数据集是流行病学数据的重要来源。为了准备用于分析的电子健康记录,有几种数据清理方法;在此,我们重点关注数据过滤。常见的数据过滤方法采用依赖于来自社会建构的优势人群数据的规则,但不适用于边缘化人群,导致宝贵数据的丢失以及对代表性不足社区的忽视。我们提出了一种基于现象学框架的新方法,该方法更公平、更具包容性,从而实现具有文化响应性的研究和发现。

方法

使用所提出的现象学(个体)和常见(队列)数据过滤方法,对育空 - 库斯科基姆健康公司(YKHC)2002年至2012年包含12402名独特个体的1262035条记录的电子健康记录进行清理。在现象学框架内,我们(i)排除了对于任何人群在生物学上都不可否认不可能的值,(ii)排除了偏离每个人平均值三个标准差之外的值,并且(iii)当数据缺失时,在个体层面使用两种形式的插补方法来处理稳定的定量和定性值。

结果

与常见的数据过滤做法相比,现象学方法保留了更多的观察结果、参与者和一系列结果,能够更真实地呈现优先人群。在比较原始数据、所实施的常见方法和所应用的现象学方法结果的敏感性分析中,我们发现现象学方法并未损害结果的完整性。

结论

即使在使用需要自动数据过滤规则的大型数据集时,大数据过滤的现象学方法也为更好地倡导边缘化社区提供了机会。我们的方法可能会使与社区合作的研究人员能够接受大型数据集,而不损害他们对社区利益和尊重的承诺。

相似文献

4
Eliciting adverse effects data from participants in clinical trials.从临床试验参与者中获取不良反应数据。
Cochrane Database Syst Rev. 2018 Jan 16;1(1):MR000039. doi: 10.1002/14651858.MR000039.pub2.

本文引用的文献

4
Challenging racism in the use of health data.挑战健康数据使用中的种族主义问题。
Lancet Digit Health. 2021 Mar;3(3):e144-e146. doi: 10.1016/S2589-7500(21)00019-4. Epub 2021 Feb 3.
9
The impact of electronic health records on diagnosis.电子健康记录对诊断的影响。
Diagnosis (Berl). 2017 Nov 27;4(4):211-223. doi: 10.1515/dx-2017-0012.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验