Centre for Statistics in Medicine, University of Oxford, Wolfson College Annexe, Oxford, OX2 6UD, UK.
BMC Med. 2011 Sep 8;9:103. doi: 10.1186/1741-7015-9-103.
The World Health Organisation estimates that by 2030 there will be approximately 350 million people with type 2 diabetes. Associated with renal complications, heart disease, stroke and peripheral vascular disease, early identification of patients with undiagnosed type 2 diabetes or those at an increased risk of developing type 2 diabetes is an important challenge. We sought to systematically review and critically assess the conduct and reporting of methods used to develop risk prediction models for predicting the risk of having undiagnosed (prevalent) or future risk of developing (incident) type 2 diabetes in adults.
We conducted a systematic search of PubMed and EMBASE databases to identify studies published before May 2011 that describe the development of models combining two or more variables to predict the risk of prevalent or incident type 2 diabetes. We extracted key information that describes aspects of developing a prediction model including study design, sample size and number of events, outcome definition, risk predictor selection and coding, missing data, model-building strategies and aspects of performance.
Thirty-nine studies comprising 43 risk prediction models were included. Seventeen studies (44%) reported the development of models to predict incident type 2 diabetes, whilst 15 studies (38%) described the derivation of models to predict prevalent type 2 diabetes. In nine studies (23%), the number of events per variable was less than ten, whilst in fourteen studies there was insufficient information reported for this measure to be calculated. The number of candidate risk predictors ranged from four to sixty-four, and in seven studies it was unclear how many risk predictors were considered. A method, not recommended to select risk predictors for inclusion in the multivariate model, using statistical significance from univariate screening was carried out in eight studies (21%), whilst the selection procedure was unclear in ten studies (26%). Twenty-one risk prediction models (49%) were developed by categorising all continuous risk predictors. The treatment and handling of missing data were not reported in 16 studies (41%).
We found widespread use of poor methods that could jeopardise model development, including univariate pre-screening of variables, categorisation of continuous risk predictors and poor handling of missing data. The use of poor methods affects the reliability of the prediction model and ultimately compromises the accuracy of the probability estimates of having undiagnosed type 2 diabetes or the predicted risk of developing type 2 diabetes. In addition, many studies were characterised by a generally poor level of reporting, with many key details to objectively judge the usefulness of the models often omitted.
世界卫生组织估计,到 2030 年,将有大约 3.5 亿人患有 2 型糖尿病。与肾脏并发症、心脏病、中风和外周血管疾病相关联,早期识别未诊断的 2 型糖尿病患者或那些有发展为 2 型糖尿病风险的患者是一项重要的挑战。我们旨在系统地审查和批判性评估用于开发风险预测模型的方法的实施和报告,以预测成年人中未诊断(流行)或未来发展(发病)2 型糖尿病的风险。
我们在 PubMed 和 EMBASE 数据库中进行了系统搜索,以确定 2011 年 5 月之前发表的描述结合两个或多个变量以预测流行或发病 2 型糖尿病风险的模型开发的研究。我们提取了描述预测模型开发方面的关键信息,包括研究设计、样本量和事件数、结局定义、风险预测因素选择和编码、缺失数据、模型建立策略以及性能方面。
共纳入 39 项研究,包含 43 个风险预测模型。17 项研究(44%)报告了用于预测发病 2 型糖尿病的模型的开发,而 15 项研究(38%)描述了用于预测流行 2 型糖尿病的模型的推导。在 9 项研究(23%)中,每个变量的事件数少于 10,而在 14 项研究中,报告的信息不足以计算该指标。候选风险预测因素的数量从 4 到 64 不等,在 7 项研究中,不清楚考虑了多少个风险预测因素。使用单变量筛选的统计学显著性来选择纳入多变量模型的风险预测因素的方法,在 8 项研究(21%)中进行,而在 10 项研究(26%)中,选择程序不明确。21 个风险预测模型(49%)通过对所有连续风险预测因素进行分类来开发。在 16 项研究(41%)中未报告缺失数据的处理。
我们发现广泛使用了可能危及模型开发的不良方法,包括变量的单变量预筛选、连续风险预测因素的分类以及缺失数据的不良处理。不良方法的使用会影响预测模型的可靠性,最终会影响未诊断 2 型糖尿病的概率估计或预测的 2 型糖尿病发病风险的准确性。此外,许多研究的报告水平普遍较差,通常会省略许多关键细节,难以客观判断模型的有用性。