O'Gorman T W, Woolson R F
Department of Mathematical Sciences, Northern Illinois University, DeKalb 60115.
Stat Med. 1993 Jan 30;12(2):143-51. doi: 10.1002/sim.4780120206.
We have evaluated the performance of four stepwise variable selection procedures commonly used in medical and epidemiologic research. The four procedures are discriminant and logistic regression and their rank transformed versions, where the independent variables are replaced by their ranks. We generated, by computer, data for two groups from several distributions with a variety of sample sizes and covariance matrices. The two ranking procedures each increased the chance of correctly selecting those variables related to group membership for data generated from log-normal or contaminated distributions. For normally distributed data the ranking procedure had little effect on variable selection. Rank transformed discriminant analysis and rank transformed logistic regression were equally effective in selecting variables when sample sizes exceeded 100. Rank transformed discriminant analysis was superior for smaller data sets. We discuss the implications of the results of this study for clinical and epidemiologic research.
我们评估了医学和流行病学研究中常用的四种逐步变量选择程序的性能。这四种程序是判别分析和逻辑回归及其秩变换版本,其中自变量被其秩所取代。我们通过计算机从具有各种样本量和协方差矩阵的几个分布中生成了两组数据。对于从对数正态或污染分布生成的数据,两种排序程序都增加了正确选择与组成员身份相关变量的机会。对于正态分布的数据,排序程序对变量选择影响很小。当样本量超过100时,秩变换判别分析和秩变换逻辑回归在选择变量方面同样有效。对于较小的数据集,秩变换判别分析更具优势。我们讨论了这项研究结果对临床和流行病学研究的意义。