Petersen Irene, Welch Catherine A, Nazareth Irwin, Walters Kate, Marston Louise, Morris Richard W, Carpenter James R, Morris Tim P, Pham Tra My
Department of Primary Care and Population Health, University College London, London NW3 2PF, UK,
Department of Clinical Epidemiology, Aarhus University, 8200 Aarhus N, Denmark,
Clin Epidemiol. 2019 Feb 11;11:157-167. doi: 10.2147/CLEP.S191437. eCollection 2019.
Clinical databases are increasingly used for health research; many of them capture information on common health indicators including height, weight, blood pressure, cholesterol level, smoking status, and alcohol consumption. However, these are often not recorded on a regular basis; missing data are ubiquitous. We described the recording of health indicators in UK primary care and evaluated key implications for handling missing data.
We examined the recording of health indicators in The Health Improvement Network (THIN) UK primary care database over time, by demographic variables (age and sex) and chronic diseases (diabetes, myocardial infarction, and stroke). Using weight as an example, we fitted linear and logistic regression models to examine the associations of weight measurements and the probability of having weight recorded with individuals' demographic characteristics and chronic diseases.
In total, 6,345,851 individuals aged 18-99 years contributed data to THIN between 2000 and 2015. Women aged 18-65 years were more likely than men of the same age to have health indicators recorded; this gap narrowed after age 65. About 60-80% of individuals had their height, weight, blood pressure, smoking status, and alcohol consumption recorded during the first year of registration. In the years following registration, these proportions fell to 10%-40%. Individuals with chronic diseases were more likely to have health indicators recorded, particularly after the introduction of a General Practitioner incentive scheme. Individuals' demographic characteristics and chronic diseases were associated with both observed weight measurements and missingness in weight.
Missing data in common health indicators will affect statistical analysis in health research studies. A single analysis of primary care data using the available information alone may be misleading. Multiple imputation of missing values accounting for demographic characteristics and disease status is recommended but should be considered and implemented carefully. Sensitivity analysis exploring alternative assumptions for missing data should also be evaluated.
临床数据库越来越多地用于健康研究;其中许多数据库收集了包括身高、体重、血压、胆固醇水平、吸烟状况和饮酒情况等常见健康指标的信息。然而,这些指标往往没有定期记录;缺失数据普遍存在。我们描述了英国初级保健中健康指标的记录情况,并评估了处理缺失数据的关键影响因素。
我们研究了英国初级保健数据库“健康改善网络”(THIN)中健康指标随时间的记录情况,按人口统计学变量(年龄和性别)以及慢性病(糖尿病、心肌梗死和中风)进行分析。以体重为例,我们拟合了线性和逻辑回归模型,以研究体重测量值以及记录体重的概率与个体人口统计学特征和慢性病之间的关联。
2000年至2015年期间,共有6345851名18 - 99岁的个体向THIN贡献了数据。18 - 65岁的女性比同龄男性更有可能记录健康指标;65岁以后这种差距缩小。约60% - 80%的个体在注册的第一年记录了身高、体重、血压、吸烟状况和饮酒情况。在注册后的几年里,这些比例降至10% - 40%。患有慢性病的个体更有可能记录健康指标,特别是在引入全科医生激励计划之后。个体的人口统计学特征和慢性病与观察到的体重测量值以及体重数据缺失均有关联。
常见健康指标中的缺失数据将影响健康研究中的统计分析。仅使用可用信息对初级保健数据进行单一分析可能会产生误导。建议对缺失值进行多重插补,同时考虑人口统计学特征和疾病状况,但应谨慎考虑并实施。还应评估探索缺失数据替代假设的敏感性分析。