不同加权方案、测量方法和可变重采样强度下重采样方法的行为。

Behaviour of resampling methods under different weighting schemes, measures and variable resampling strengths.

作者信息

Kopuchian Cecilia, Ramírez Martín J

机构信息

División Ornitología.

Division Aracnología, CONICET, Museo Argentino de Ciencias Naturales "Bernardino Rivadavia", Av. Ángel Gallardo 470, C1405DJR, Buenos Aires, Argentina.

出版信息

Cladistics. 2010 Feb;26(1):86-97. doi: 10.1111/j.1096-0031.2009.00269.x. Epub 2009 Aug 25.

DOI:10.1111/j.1096-0031.2009.00269.x

PMID:34875757

Abstract

We compared general behaviour trends of resampling methods (bootstrap, bootstrap with Poisson distribution, jackknife, and jackknife with symmetric resampling) and different ways to summarize the results for resampling (absolute frequency, F, and frequency difference, GC') for real data sets under variable resampling strengths in three weighting schemes. We propose an equivalence between bootstrap and jackknife in order to make bootstrap variable across different resampling strengths. Specifically, for each method we evaluated the number of spurious groups (groups not present in the strict consensus of the unaltered data set), of real groups, and of inconsistencies in ranking of groups under variable resampling strengths. We found that GC' always generated more spurious groups and recovered more groups than F. Bootstrap methods generated more spurious groups than jackknife methods; and jackknife is the method that recovered more real groups. We consistently obtained a higher proportion of spurious groups for GC' than for F; and for bootstrap than for jackknife. Finally, we evaluated the ranking of groups under variable resampling strengths qualitatively in the trajectories of "support" against resampling strength, and quantitatively with Kendall coefficient values. We found fewer ranking inconsistencies for GC' than for F, and for bootstrap than for jackknife. © The Willi Hennig Society 2009.

摘要

我们比较了重采样方法（自助法、泊松分布自助法、刀切法以及对称重采样刀切法）的一般行为趋势，以及在三种加权方案下，针对不同重采样强度的真实数据集，重采样结果汇总的不同方式（绝对频率、F值以及频率差异GC'）。我们提出自助法和刀切法之间的等效性，以便使自助法在不同重采样强度下具有可变性。具体而言，对于每种方法，我们评估了在不同重采样强度下，虚假组（在未改变的数据集严格共识中不存在的组）、真实组的数量，以及组排名中的不一致性。我们发现，与F值相比，GC'总是产生更多的虚假组且恢复出更多的组。自助法产生的虚假组比刀切法更多；而刀切法是恢复出更多真实组的方法。我们始终发现，GC'产生的虚假组比例高于F值；自助法产生的虚假组比例高于刀切法。最后，我们在“支持度”相对于重采样强度的轨迹中，定性评估了不同重采样强度下组的排名，并通过肯德尔系数值进行定量评估。我们发现，与F值相比，GC'的排名不一致性更少；与刀切法相比，自助法的排名不一致性更少。© 威利·亨尼希协会2009年。