Ledberg Anders, Wennberg Peter
Centre for Social Research on Alcohol and Drugs, SoRAD, Stockholm University, SE-10691 Stockholm, Sweden.
BMC Med Res Methodol. 2014 Apr 27;14:58. doi: 10.1186/1471-2288-14-58.
Prevalence estimates of drug use, or of its consequences, are considered important in many contexts and may have substantial influence over public policy. However, it is rarely possible to simply count the relevant individuals, in particular when the defining characteristics might be illegal, as in the drug use case. Consequently methods are needed to estimate the size of such partly 'hidden' populations, and many such methods have been developed and used within epidemiology including studies of alcohol and drug use. Here we introduce a method appropriate for estimating the size of human populations given a single source of data, for example entries in a health-care registry.
The setup is the following: during a fixed time-period, e.g. a year, individuals belonging to the target population have a non-zero probability of being "registered". Each individual might be registered multiple times and the time-points of the registrations are recorded. Assuming that the population is closed and that the probability of being registered at least once is constant, we derive a family of maximum likelihood (ML) estimators of total population size. We study the ML estimator using Monte Carlo simulations and delimit the range of cases where it is useful. In particular we investigate the effect of making the population heterogeneous with respect to probability of being registered.
The new estimator is asymptotically unbiased and we show that high precision estimates can be obtained for samples covering as little as 25% of the total population size. However, if the total population size is small (say in the order of 500) a larger fraction needs to be sampled to achieve reliable estimates. Further we show that the estimator give reliable estimates even when individuals differ in the probability of being registered. We also compare the ML estimator to an estimator known as Chao's estimator and show that the latter can have a substantial bias when applied to epidemiological data.
The population size estimator suggested herein complements existing methods and is less sensitive to certain types of dependencies typical in epidemiological data.
在许多情况下,药物使用及其后果的患病率估计被认为很重要,并且可能对公共政策产生重大影响。然而,特别是当定义特征可能是非法的时候,比如在药物使用的情况下,简单地统计相关个体几乎是不可能的。因此,需要一些方法来估计这类部分“隐藏”人群的规模,并且在流行病学领域已经开发并使用了许多这样的方法,包括对酒精和药物使用的研究。在这里,我们介绍一种适用于在给定单一数据源(例如医疗保健登记处的记录)的情况下估计人群规模的方法。
设置如下:在一个固定的时间段内,例如一年,属于目标人群的个体有非零概率被“登记”。每个个体可能被多次登记,并且记录登记的时间点。假设人群是封闭的,并且至少被登记一次的概率是恒定的,我们推导出一系列总体规模的最大似然(ML)估计量。我们使用蒙特卡罗模拟研究ML估计量,并确定其有用的情况范围。特别是,我们研究了使人群在登记概率方面具有异质性的影响。
新的估计量是渐近无偏的,并且我们表明,对于仅覆盖总人口规模25%的样本,也可以获得高精度的估计。然而,如果总人口规模较小(比如说在500左右),则需要抽取更大比例的样本才能获得可靠的估计。此外,我们表明,即使个体在登记概率上存在差异,该估计量也能给出可靠的估计。我们还将ML估计量与一种称为Chao估计量的估计量进行了比较,结果表明,当应用于流行病学数据时,后者可能存在相当大的偏差。
本文提出的总体规模估计量补充了现有方法,并且对流行病学数据中典型的某些类型的相关性不太敏感。