Institute for Quaternary and Climate Studies and Department of Biological Sciences, University of Maine, Orono, Maine 04469 USA;
Am J Bot. 2002 Sep;89(9):1459-67. doi: 10.3732/ajb.89.9.1459.
Little is known about the paleoecological histories of the three spruce species (white spruce, Picea glauca; black spruce, P. mariana; and red spruce P. rubens) in eastern North America, largely because of the difficulty of separating the three species in the pollen record. We describe a novel and effective classification method of distinguishing pollen grains on the basis of quantitative analysis of grain attributes. The method is illustrated by an analysis of a large sample of modern pollen grains (522 grains from 38 collections) of the three Picea species, collected from the region where the three species co-occur today. For each species X we computed a binary regression tree that classified each grain either as X or as not-X; these three determinations for each grain were then combined as Hamming codes in an error/uncertainty detection procedure. The use of Hamming codes to link multiple binary trees for error detection allowed identification and exclusion of problematic specimens, with correspondingly greater classification certainty among the remaining grains. We measured 13 attributes of 419 reference grains of the three species to construct the regression trees and classified 103 other reference grains by testing. Species-specific accuracies among the reliably classified grains were 100, 77, and 76% for P. glauca, P. mariana, and P. rubens, respectively, and 21, 30, and 22% of the grains by species, respectively, were problematic. The method is applicable to any multi-species classification problem for which a large reference sample is available.
关于北美东部三种云杉(白云杉、黑云杉和红云杉)的古生态学历史,人们知之甚少,主要是因为在花粉记录中很难区分这三个物种。我们描述了一种新颖而有效的分类方法,基于对颗粒属性的定量分析来区分花粉颗粒。该方法通过对来自三种云杉物种今天共存地区的大量现代花粉颗粒(38 个样本中的 522 个颗粒)的分析进行说明。对于每个物种 X,我们计算了一个二进制回归树,该树将每个颗粒分类为 X 或非 X;然后,这些三个确定值被合并为汉明码,以进行错误/不确定检测程序。使用汉明码链接多个二叉树以进行错误检测,可以识别和排除有问题的标本,从而在剩余的颗粒中具有更高的分类确定性。我们测量了三个物种的 419 个参考颗粒的 13 个属性,以构建回归树,并通过测试对 103 个其他参考颗粒进行分类。在可靠分类的颗粒中,各物种的准确率分别为 100%、77%和 76%,而各物种的 21%、30%和 22%的颗粒分别存在问题。该方法适用于任何具有大量参考样本的多物种分类问题。