Moskowitz Chaya S, Seshan Venkatraman E, Riedel Elyn R, Begg Colin B
Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, 307 East 63rd Street, 3rd Floor, New York, NY 10021, USA.
Stat Med. 2008 Jul 20;27(16):3191-208. doi: 10.1002/sim.3151.
The Lorenz curve is a graphical tool that is widely used to characterize the concentration of a measure in a population, such as wealth. It is frequently the case that the measure of interest used to rank experimental units when estimating the empirical Lorenz curve, and the corresponding Gini coefficient, is subject to random error. This error can result in an incorrect ranking of experimental units which inevitably leads to a curve that exaggerates the degree of concentration (variation) in the population. We consider a specific data configuration with a hierarchical structure where multiple observations are aggregated within experimental units to form the outcome whose distribution is of interest. Within this context, we explore this bias and discuss several widely available statistical methods that have the potential to reduce or remove the bias in the empirical Lorenz curve. The properties of these methods are examined and compared in a simulation study. This work is motivated by a health outcomes application that seeks to assess the concentration of black patient visits among primary care physicians. The methods are illustrated on data from this study.
洛伦兹曲线是一种图形工具,广泛用于描述某一指标在人群中的集中程度,如财富。在估计经验洛伦兹曲线及相应的基尼系数时,用于对实验单位进行排序的感兴趣指标常常会受到随机误差的影响。这种误差可能导致实验单位的排序错误,进而不可避免地产生一条夸大人群中集中程度(差异)的曲线。我们考虑一种具有层次结构的特定数据配置,其中多个观测值在实验单位内进行汇总,以形成我们感兴趣的分布的结果。在此背景下,我们探讨这种偏差,并讨论几种广泛可用的统计方法,这些方法有可能减少或消除经验洛伦兹曲线中的偏差。在一项模拟研究中检验并比较了这些方法的性质。这项工作的动机来自一项健康结果应用,该应用旨在评估初级保健医生中黑人患者就诊的集中程度。文中通过该研究的数据对这些方法进行了说明。