Lipin Mikhail Y, Crampton Elias Benjamin, Thomas Steven A
Department of Pharmacology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.
Fox School of Business and Management, Temple University, Philadelphia, Pennsylvania, United States of America.
bioRxiv. 2025 Aug 8:2025.08.05.668614. doi: 10.1101/2025.08.05.668614.
Data in experimental biology are frequently marred by outliers and asymmetric distributions. The median, being a robust estimator of central tendency, is less sensitive to outliers than the mean. However, for ranked datasets with an even number of observations, the conventional median-calculated as the average of the two middle values-can introduce bias by implicitly assuming symmetry in the data distribution. This study aims to identify a median estimator that is unbiased. To derive the unbiased median estimator, we minimized the sum of residuals raised to a rational power approaching one. We compared the properties of the unbiased and conventional medians using Poisson-distributed datasets. Random samples were generated with the Mersenne Twister algorithm implemented in IgorPro software (WaveMetrics Inc., Oregon). For odd sample sizes, the unbiased median coincides with the conventional median (the middle value). For even sample sizes, the unbiased median is defined as the value that equalizes the product of distances to data points above and below it-a definition that differs from the conventional median in asymmetric distributions. Although both median estimators tend to underestimate the mean of Poisson-distributed data, the unbiased median is consistently closer to the expected value. Additionally, the unbiased median exhibits lower variance compared to the conventional median. Thus, for even sample sizes, the proposed unbiased median provides a central tendency measure that is unbiased, more accurate, and has reduced variance relative to the conventional median.
实验生物学中的数据常常受到异常值和不对称分布的影响。中位数作为一种稳健的集中趋势估计量,对异常值的敏感度低于均值。然而,对于观测值数量为偶数的排序数据集,传统的中位数(计算为两个中间值的平均值)通过隐含地假设数据分布对称,可能会引入偏差。本研究旨在确定一种无偏的中位数估计量。为了推导无偏中位数估计量,我们将残差的幂次提升到接近1的有理数并使其最小化。我们使用泊松分布数据集比较了无偏中位数和传统中位数的性质。随机样本通过IgorPro软件(WaveMetrics公司,俄勒冈州)中实现的梅森旋转算法生成。对于奇数样本量,无偏中位数与传统中位数(中间值)一致。对于偶数样本量,无偏中位数被定义为使到其上方和下方数据点的距离之积相等的值——该定义在不对称分布中与传统中位数不同。尽管两种中位数估计量都倾向于低估泊松分布数据的均值,但无偏中位数始终更接近期望值。此外,与传统中位数相比,无偏中位数的方差更低。因此对于偶数样本量,所提出的无偏中位数提供了一种集中趋势度量,它相对于传统中位数是无偏的、更准确的且方差更小。