Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA.
J Am Med Inform Assoc. 2021 Dec 28;29(1):187-196. doi: 10.1093/jamia/ocab199.
The aim of this study was to collect and synthesize evidence regarding data quality problems encountered when working with variables related to social determinants of health (SDoH).
We conducted a systematic review of the literature on social determinants research and data quality and then iteratively identified themes in the literature using a content analysis process.
The most commonly represented quality issue associated with SDoH data is plausibility (n = 31, 41%). Factors related to race and ethnicity have the largest body of literature (n = 40, 53%). The first theme, noted in 62% (n = 47) of articles, is that bias or validity issues often result from data quality problems. The most frequently identified validity issue is misclassification bias (n = 23, 30%). The second theme is that many of the articles suggest methods for mitigating the issues resulting from poor social determinants data quality. We grouped these into 5 suggestions: avoid complete case analysis, impute data, rely on multiple sources, use validated software tools, and select addresses thoughtfully.
The type of data quality problem varies depending on the variable, and each problem is associated with particular forms of analytical error. Problems encountered with the quality of SDoH data are rarely distributed randomly. Data from Hispanic patients are more prone to issues with plausibility and misclassification than data from other racial/ethnic groups.
Consideration of data quality and evidence-based quality improvement methods may help prevent bias and improve the validity of research conducted with SDoH data.
本研究旨在收集和综合与健康社会决定因素(SDoH)相关变量相关的数据质量问题的证据。
我们对社会决定因素研究和数据质量的文献进行了系统回顾,然后使用内容分析过程对文献中的主题进行迭代识别。
与 SDoH 数据相关的最常见代表性质量问题是合理性(n=31,41%)。与种族和民族相关的因素有最大的文献量(n=40,53%)。第一个主题,在 62%(n=47)的文章中都有提到,是数据质量问题经常导致偏差或有效性问题。最常被识别的有效性问题是分类错误偏差(n=23,30%)。第二个主题是,许多文章都提出了减轻因不良社会决定因素数据质量而导致的问题的方法。我们将这些方法分为 5 种建议:避免完全案例分析、数据插补、依赖多个来源、使用经过验证的软件工具和谨慎选择地址。
数据质量问题的类型取决于变量,每种问题都与特定形式的分析误差有关。与 SDoH 数据质量相关的问题很少随机分布。与其他种族/民族群体的数据相比,西班牙裔患者的数据在合理性和分类错误方面更容易出现问题。
考虑数据质量和基于证据的质量改进方法可能有助于防止偏差并提高使用 SDoH 数据进行研究的有效性。