Elfadaly Fadlalla G, Garthwaite Paul H, Crawford John R
Department of Mathematics and Statistics, The Open University, UK; Department of Statistics, Faculty of Economics and Political Science, Cairo University, Egypt.
Department of Mathematics and Statistics, The Open University, UK.
Comput Stat Data Anal. 2016 Jul;99:115-130. doi: 10.1016/j.csda.2016.01.014.
Mahalanobis distance may be used as a measure of the disparity between an individual's profile of scores and the average profile of a population of controls. The degree to which the individual's profile is unusual can then be equated to the proportion of the population who would have a larger Mahalanobis distance than the individual. Several estimators of this proportion are examined. These include plug-in maximum likelihood estimators, medians, the posterior mean from a Bayesian probability matching prior, an estimator derived from a Taylor expansion, and two forms of polynomial approximation, one based on Bernstein polynomial and one on a quadrature method. Simulations show that some estimators, including the commonly-used plug-in maximum likelihood estimators, can have substantial bias for small or moderate sample sizes. The polynomial approximations yield estimators that have low bias, with the quadrature method marginally to be preferred over Bernstein polynomials. However, the polynomial estimators sometimes yield infeasible estimates that are outside the 0-1 range. While none of the estimators are perfectly unbiased, the median estimators match their definition; in simulations their estimates of the proportion have a median error close to zero. The standard median estimator can give unrealistically small estimates (including 0) and an adjustment is proposed that ensures estimates are always credible. This latter estimator has much to recommend it when unbiasedness is not of paramount importance, while the quadrature method is recommended when bias is the dominant issue.
马氏距离可作为衡量个体得分概况与对照人群平均概况之间差异的一种度量。个体概况异常的程度可等同于在总体中马氏距离大于该个体的人群比例。本文研究了该比例的几种估计方法。这些方法包括代入式最大似然估计、中位数、基于贝叶斯概率匹配先验的后验均值、基于泰勒展开得到的估计以及两种多项式近似形式,一种基于伯恩斯坦多项式,另一种基于求积法。模拟结果表明,一些估计方法,包括常用的代入式最大似然估计,在小样本或中等样本量时可能存在较大偏差。多项式近似得到的估计偏差较小,求积法略优于伯恩斯坦多项式。然而,多项式估计有时会产生超出0 - 1范围的不可行估计。虽然没有一种估计方法是完全无偏的,但中位数估计符合其定义;在模拟中,它们对比例的估计中位数误差接近零。标准中位数估计可能会给出不切实际的小估计值(包括0),本文提出了一种调整方法,以确保估计值始终可信。当无偏性不是最重要的问题时,后一种估计方法有很多优点,而当偏差是主要问题时,推荐使用求积法。