Suppr超能文献

分类树作为线性判别分析的替代方法。

Classification trees as an alternative to linear discriminant analysis.

作者信息

Feldesman Marc R

机构信息

Department of Anthropology, Portland State University, Portland, Oregon 97207, USA.

出版信息

Am J Phys Anthropol. 2002 Nov;119(3):257-75. doi: 10.1002/ajpa.10102.

Abstract

Linear discriminant analysis (LDA) is frequently used for classification/prediction problems in physical anthropology, but it is unusual to find examples where researchers consider the statistical limitations and assumptions required for this technique. In these instances, it is difficult to know whether the predictions are reliable. This paper considers a nonparametric alternative to predictive LDA: binary, recursive (or classification) trees. This approach has the advantage that data transformation is unnecessary, cases with missing predictor variables do not require special treatment, prediction success is not dependent on data meeting normality conditions or covariance homogeneity, and variable selection is intrinsic to the methodology. Here I compare the efficacy of classification trees with LDA, using typical morphometric data. With data from modern hominoids, the results show that both techniques perform nearly equally. With complete data sets, LDA may be a better choice, as is shown in this example, but with missing observations, classification trees perform outstandingly well, whereas commercial discriminant analysis programs do not predict classifications for cases with incompletely measured predictor variables and generally are not designed to address the problem of missing data. Testing of data prior to analysis is necessary, and classification trees are recommended either as a replacement for LDA or as a supplement whenever data do not meet relevant assumptions. It is highly recommended as an alternative to LDA whenever the data set contains important cases with missing predictor variables.

摘要

线性判别分析(LDA)在体质人类学的分类/预测问题中经常被使用,但研究人员考虑该技术所需的统计局限性和假设的例子却并不常见。在这些情况下,很难知道预测是否可靠。本文考虑了一种预测性LDA的非参数替代方法:二元递归(或分类)树。这种方法的优点是无需进行数据转换,预测变量缺失的案例不需要特殊处理,预测成功不依赖于数据满足正态性条件或协方差同质性,并且变量选择是该方法固有的。在这里,我使用典型的形态测量数据比较了分类树与LDA的功效。对于现代类人猿的数据,结果表明这两种技术的表现几乎相同。在完整数据集的情况下,如本示例所示,LDA可能是更好的选择,但在存在缺失观测值的情况下,分类树的表现非常出色,而商业判别分析程序不会对预测变量测量不完整的案例进行分类预测,并且通常未设计用于解决数据缺失问题。分析前对数据进行检验是必要的,当数据不满足相关假设时,建议将分类树作为LDA的替代方法或补充方法。当数据集包含预测变量缺失的重要案例时,强烈建议将其作为LDA的替代方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验