Sauerbrei Willi, Perperoglou Aris, Schmid Matthias, Abrahamowicz Michal, Becher Heiko, Binder Harald, Dunkler Daniela, Harrell Frank E, Royston Patrick, Heinze Georg
1Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany.
Data Science and Artificial Intelligence AstraZeneca, Cambridge, UK.
Diagn Progn Res. 2020 Apr 2;4:3. doi: 10.1186/s41512-020-00074-3. eCollection 2020.
How to select variables and identify functional forms for continuous variables is a key concern when creating a multivariable model. Ad hoc 'traditional' approaches to variable selection have been in use for at least 50 years. Similarly, methods for determining functional forms for continuous variables were first suggested many years ago. More recently, many alternative approaches to address these two challenges have been proposed, but knowledge of their properties and meaningful comparisons between them are scarce. To define a state of the art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge, many outstanding issues in multivariable modelling remain. Our main aims are to identify and illustrate such gaps in the literature and present them at a moderate technical level to the wide community of practitioners, researchers and students of statistics.
We briefly discuss general issues in building descriptive regression models, strategies for variable selection, different ways of choosing functional forms for continuous variables and methods for combining the selection of variables and functions. We discuss two examples, taken from the medical literature, to illustrate problems in the practice of modelling.
Our overview revealed that there is not yet enough evidence on which to base recommendations for the selection of variables and functional forms in multivariable analysis. Such evidence may come from comparisons between alternative methods. In particular, we highlight seven important topics that require further investigation and make suggestions for the direction of further research.
Selection of variables and of functional forms are important topics in multivariable analysis. To define a state of the art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge, further comparative research is required.
在创建多变量模型时,如何选择变量以及确定连续变量的函数形式是一个关键问题。临时的“传统”变量选择方法已经使用了至少50年。同样,确定连续变量函数形式的方法早在多年前就已被提出。最近,人们提出了许多应对这两个挑战的替代方法,但对它们的性质了解以及它们之间有意义的比较却很少。为了界定当前的技术水平,并为仅有基础统计知识的研究人员提供有证据支持的指导,多变量建模中仍存在许多突出问题。我们的主要目标是识别并阐明文献中的此类差距,并以适度的技术水平呈现给广大的从业者、研究人员和统计学学生群体。
我们简要讨论构建描述性回归模型中的一般问题、变量选择策略、为连续变量选择函数形式的不同方法以及变量和函数选择相结合的方法。我们讨论两个取自医学文献的例子,以说明建模实践中的问题。
我们的综述表明,在多变量分析中,尚无足够的证据为变量和函数形式的选择提供建议。此类证据可能来自替代方法之间的比较。特别是,我们强调了七个需要进一步研究的重要主题,并对进一步研究的方向提出了建议。
变量选择和函数形式选择是多变量分析中的重要主题。为了界定当前的技术水平,并为仅有基础统计知识的研究人员提供有证据支持的指导,需要进一步开展比较研究。