Heinze Georg, Dunkler Daniela
Section for Clinical Biometrics, Center for Medical Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Vienna, Austria.
Transpl Int. 2017 Jan;30(1):6-10. doi: 10.1111/tri.12895.
Multivariable regression models are often used in transplantation research to identify or to confirm baseline variables which have an independent association, causally or only evidenced by statistical correlation, with transplantation outcome. Although sound theory is lacking, variable selection is a popular statistical method which seemingly reduces the complexity of such models. However, in fact, variable selection often complicates analysis as it invalidates common tools of statistical inference such as P-values and confidence intervals. This is a particular problem in transplantation research where sample sizes are often only small to moderate. Furthermore, variable selection requires computer-intensive stability investigations and a particularly cautious interpretation of results. We discuss how five common misconceptions often lead to inappropriate application of variable selection. We emphasize that variable selection and all problems related with it can often be avoided by the use of expert knowledge.
多变量回归模型常用于移植研究,以识别或确认与移植结果存在独立关联(因果关系或仅通过统计相关性证明)的基线变量。尽管缺乏完善的理论,但变量选择是一种常用的统计方法,它似乎能降低此类模型的复杂性。然而,实际上,变量选择常常使分析变得复杂,因为它会使诸如P值和置信区间等常用统计推断工具失效。在移植研究中,这是一个特别的问题,因为样本量通常较小到中等。此外,变量选择需要进行计算机密集型的稳定性研究,并且对结果的解释要格外谨慎。我们讨论了五个常见的误解如何常常导致变量选择的不恰当应用。我们强调,通过运用专业知识,变量选择及其相关的所有问题通常是可以避免的。