Ren Wenhui, Fan Keyu, Liu Zheng, Wu Yanqiu, An Haiyan, Liu Huixin
Department of Clinical Epidemiology and Biostatistics, Peking University People's Hospital, Beijing, China.
Department of Anesthesiology, Peking University People's Hospital, Beijing, China.
J Diabetes. 2025 Jan;17(1):e70049. doi: 10.1111/1753-0407.70049.
Understanding is limited regarding strategies for addressing missing value when developing and validating models to predict cardiovascular disease (CVD) in type 2 diabetes mellitus (T2DM). This study aimed to investigate the presence of and approaches to missing data in these prediction models. The MEDLINE electronic database was systematically searched for English-language studies from inception to June 30, 2024. The percentages of missing values, missingness mechanisms, and missing data handling strategies in the included studies were extracted and summarized. This study included 51 articles published between 2001 and 2024, involving 19 studies that focused solely on prediction model development, and 16 and 16 studies that incorporated internal and external validation, respectively. Most articles reported missing data in the development (n = 40/51) and external validation (n = 12/16) stages. Furthermore, the missing data were addressed in 74.5% of development studies and 68.8% of validation studies. Imputation emerged as the predominant method employed for both development (27/40) and validation (7/12) purposes, followed by deletion (17/40 and 4/12, respectively). During the model development phase, the number of studies reported missing data increased from 9 out of 15 before 2016 to 31 out of 36 in 2016 and subsequent years. Although missing values have received much attention in CVD risk prediction models in patients with T2DM, most studies lack adequate reporting on the methodologies used for addressing the missing data. Enhancing the quality assurance of prediction models necessitates heightened clarity and the utilization of suitable methodologies to handle missing data effectively.
在开发和验证用于预测2型糖尿病(T2DM)患者心血管疾病(CVD)的模型时,对于处理缺失值的策略,人们的了解有限。本研究旨在调查这些预测模型中缺失数据的存在情况及处理方法。对MEDLINE电子数据库进行了系统检索,以查找从数据库建立到2024年6月30日的英文研究。提取并总结了纳入研究中缺失值的百分比、缺失机制和缺失数据处理策略。本研究纳入了2001年至2024年发表的51篇文章,其中19项研究仅专注于预测模型开发,16项和16项研究分别纳入了内部和外部验证。大多数文章报告了在开发阶段(n = 40/51)和外部验证阶段(n = 12/16)存在缺失数据。此外,在74.5%的开发研究和68.8%的验证研究中对缺失数据进行了处理。插补成为开发(27/40)和验证(7/12)目的中使用的主要方法,其次是删除(分别为17/40和4/12)。在模型开发阶段,报告存在缺失数据的研究数量从2016年之前的15项中的9项增加到2016年及以后的36项中的31项。尽管缺失值在T2DM患者的CVD风险预测模型中受到了很多关注,但大多数研究在处理缺失数据所使用的方法上缺乏充分的报告。提高预测模型的质量保证需要更高的清晰度,并采用合适的方法来有效处理缺失数据。