Barón A E
Department of Preventive Medicine and Biometrics, University of Colorado Health Sciences Center, Denver 80262.
Stat Med. 1991 May;10(5):757-66. doi: 10.1002/sim.4780100511.
Methods of multiple group discriminant analysis have not been fully studied with respect to classification into more than two populations when the covariate distributions are normal or non-normal. The present study examines the classification performance of several multiple discrimination methods under a variety of simulated continuous normal and non-normal covariate distributions. The methods include polychotomous logistic regression, multiple group linear discriminant analysis, kernel density estimation, and rank transformations of the data as input into the linear function. The parameters of interest were distance among populations, configuration of population mean vectors (collinear or forming the vertices of a regular simplex), skewness, kurtosis and bimodality. Simulation of the last three parameters was by log-normal, sinh-1 normal and a two-component mixture of normal distributions, respectively. Results with three trivariate populations show that for all distributions, logistic discrimination classifies close to the optimal under Neyman-Pearson allocation. These results suggest that logistic discrimination is preferable to other widely-used methods for multiple group classification with non-normal data, and is comparable to classification by multiple linear discrimination with normal data.
当协变量分布为正态或非正态时,对于分为两个以上总体的情况,多组判别分析方法尚未得到充分研究。本研究考察了在各种模拟的连续正态和非正态协变量分布下几种多判别方法的分类性能。这些方法包括多分类逻辑回归、多组线性判别分析、核密度估计以及将数据的秩变换作为线性函数的输入。感兴趣的参数包括总体间的距离、总体均值向量的构型(共线或形成正单纯形的顶点)、偏度、峰度和双峰性。最后三个参数的模拟分别通过对数正态分布、反双曲正弦正态分布和正态分布的双组分混合分布进行。三个三变量总体的结果表明,对于所有分布,逻辑判别在奈曼 - 皮尔逊分配下接近最优分类。这些结果表明,对于非正态数据的多组分类,逻辑判别比其他广泛使用的方法更可取,并且与正态数据的多线性判别分类相当。