Suppr超能文献

一篇关于临床预测模型变量选择的教程:数据挖掘中的特征选择方法可以改善结果。

A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results.

机构信息

Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Velenjak, 1985717413 Tehran, Iran.

Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Velenjak, 1985717413 Tehran, Iran.

出版信息

J Clin Epidemiol. 2016 Mar;71:76-85. doi: 10.1016/j.jclinepi.2015.10.002. Epub 2015 Oct 22.

Abstract

OBJECTIVES

Identifying an appropriate set of predictors for the outcome of interest is a major challenge in clinical prediction research. The aim of this study was to show the application of some variable selection methods, usually used in data mining, for an epidemiological study. We introduce here a systematic approach.

STUDY DESIGN AND SETTING

The P-value-based method, usually used in epidemiological studies, and several filter and wrapper methods were implemented to select the predictors of diabetes among 55 variables in 803 prediabetic females, aged ≥ 20 years, followed for 10-12 years. To develop a logistic model, variables were selected from a train data set and evaluated on the test data set. The measures of Akaike information criterion (AIC) and area under the curve (AUC) were used as performance criteria. We also implemented a full model with all 55 variables.

RESULTS

We found that the worst and the best models were the full model and models based on the wrappers, respectively. Among filter methods, symmetrical uncertainty gave both the best AUC and AIC.

CONCLUSION

Our experiment showed that the variable selection methods used in data mining could improve the performance of clinical prediction models. An R program was developed to make these methods more feasible and visualize the results.

摘要

目的

确定与感兴趣结局相关的合适预测因子集是临床预测研究中的主要挑战。本研究旨在展示一些通常用于数据挖掘的变量选择方法在流行病学研究中的应用。我们在这里介绍一种系统的方法。

设计和设置

本研究采用基于 P 值的方法(通常用于流行病学研究)和几种筛选器和封装器方法,从 803 名年龄≥20 岁的糖尿病前期女性中筛选出 55 个变量中的预测因子,随访 10-12 年。为了开发逻辑回归模型,从训练数据集中选择变量,并在测试数据集中评估。采用赤池信息量准则(AIC)和曲线下面积(AUC)作为性能标准。我们还建立了包含所有 55 个变量的全模型。

结果

我们发现最差和最好的模型分别是全模型和基于封装器的模型。在筛选器方法中,对称不确定性得到了最佳的 AUC 和 AIC。

结论

我们的实验表明,数据挖掘中使用的变量选择方法可以提高临床预测模型的性能。我们开发了一个 R 程序,使这些方法更加可行,并可视化结果。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验