Centre for Statistics in Medicine, Botnar Research Centre, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences (NDORMS), University of Oxford, Oxford, UK.
Institute of Health Informatics, 222 Euston Road, London, NW1 2DA, University College London, London, UK.
Sci Data. 2024 Feb 22;11(1):221. doi: 10.1038/s41597-024-02958-1.
Intersectional social determinants including ethnicity are vital in health research. We curated a population-wide data resource of self-identified ethnicity data from over 60 million individuals in England primary care, linking it to hospital records. We assessed ethnicity data in terms of completeness, consistency, and granularity and found one in ten individuals do not have ethnicity information recorded in primary care. By linking to hospital records, ethnicity data were completed for 94% of individuals. By reconciling SNOMED-CT concepts and census-level categories into a consistent hierarchy, we organised more than 250 ethnicity sub-groups including and beyond "White", "Black", "Asian", "Mixed" and "Other, and found them to be distributed in proportions similar to the general population. This large observational dataset presents an algorithmic hierarchy to represent self-identified ethnicity data collected across heterogeneous healthcare settings. Accurate and easily accessible ethnicity data can lead to a better understanding of population diversity, which is important to address disparities and influence policy recommendations that can translate into better, fairer health for all.
交叉的社会决定因素包括种族,在健康研究中至关重要。我们从英国初级保健的超过 6000 万人中整理了一个人群范围的自我认同种族数据资源,并将其与医院记录相关联。我们评估了种族数据的完整性、一致性和粒度,并发现每 10 个人中就有 1 个人在初级保健中没有记录种族信息。通过与医院记录的链接,94%的人完成了种族数据。通过将 SNOMED-CT 概念和人口普查级别的类别协调到一个一致的层次结构中,我们组织了超过 250 个种族子组,包括和超越“白人”、“黑人”、“亚洲人”、“混血”和“其他”,并发现它们的分布比例与总人口相似。这个大型观察性数据集提供了一个算法层次结构来表示在异构医疗保健环境中收集的自我认同种族数据。准确和易于访问的种族数据可以更好地了解人口多样性,这对于解决差异和影响可以转化为所有人更好、更公平的健康的政策建议非常重要。