Prigge Regina, Fleetwood Kelly J, Jackson Caroline A, Mercer Stewart W, Kelly Paul At, Sudlow Cathie, Norrie John D, Morales Daniel R, Smith Daniel J, Guthrie Bruce
Usher Institute, University of Edinburgh, Edinburgh, UK.
Public Member of Study Advisory Board, Edinburgh, UK.
Commun Med (Lond). 2025 Jul 8;5(1):283. doi: 10.1038/s43856-025-00995-4.
Measurement of multimorbidity, the co-occurrence of two or more conditions in the same individual, is highly variable which limits the consistency and reproducibility of research.
Using data from 172,563 UK Biobank (UKB) participants and a cross-sectional approach, we examined how choice of data source affected estimated prevalence of 80 individual long-term conditions (LTCs) and multimorbidity. We developed code-list-based algorithms to determine the prevalence of 80 LTCs in (1) primary care records, (2) UKB baseline assessment, (3) hospital/cancer registry records, and (4) all three data sources together.
Using records from all three data sources, 146,811 (85.1%) participants have at least one and 109,609 (63.5%) have at least two LTCs at baseline. A median of 4.7% (IQR 1.0-16.6) of participants with a condition are identified by all three data sources. Agreement is highest for endocrine, nutritional and metabolic disorders, with a median of 32.9% (IQR 20.5-34.1) of individuals with a condition identified by all three data sources. Agreement is lowest for diseases of the genitourinary system and mental and behavioural disorders where perfect agreement varies from zero to 4.9% and zero to 12.3% across conditions, respectively. The low agreement between data sources is accompanied by high proportions of individuals with a condition identified only in primary care data (i.e. not in either of the other two sources), with a median of 59.3% (IQR 47.4-75.9) for diseases of the genitourinary system and 66.9% (IQR 42.8-79.2) for mental and behavioural disorders.
Our study highlights the impact of the choice of which data source is used in research on individual LTCs and multimorbidity, and the importance of clearly justifying choices made.
共病指同一个体同时存在两种或更多种疾病,对其进行测量时存在很大差异,这限制了研究的一致性和可重复性。
我们采用横断面研究方法,利用来自172,563名英国生物银行(UKB)参与者的数据,研究了数据来源的选择如何影响80种个体长期疾病(LTCs)和共病的估计患病率。我们开发了基于编码列表的算法,以确定80种LTCs在以下方面的患病率:(1)初级保健记录;(2)UKB基线评估;(3)医院/癌症登记记录;以及(4)所有这三个数据源的数据汇总。
使用所有三个数据源的记录,146,811名(85.1%)参与者在基线时至少患有一种LTC,109,609名(63.5%)参与者至少患有一种LTC。所有三个数据源均识别出的患有某种疾病的参与者中位数为4.7%(四分位间距1.0 - 16.6)。在内分泌、营养和代谢紊乱方面,一致性最高,所有三个数据源均识别出的患有某种疾病的个体中位数为32.9%(四分位间距20.5 - 34.1)。在生殖泌尿系统疾病以及精神和行为障碍方面,一致性最低,在所有疾病中,完全一致的比例分别从零到4.9%以及从零到12.3%不等。数据源之间的低一致性伴随着很大比例的个体仅在初级保健数据中被识别出患有某种疾病(即未在其他两个数据源中的任何一个中被识别出),生殖泌尿系统疾病的中位数为59.3%(四分位间距47.4 - 75.9),精神和行为障碍的中位数为66.9%(四分位间距42.8 - 79.2)。
我们的研究强调了在研究个体LTCs和共病时选择数据源的影响,以及明确说明所做选择的重要性。