Suppr超能文献

在记录无法关联时估计人口规模和重复率。

Estimating population size and duplication rates when records cannot be linked.

作者信息

Laska Eugene M, Meisner Morris, Wanderling Joseph, Siegel Carole

机构信息

Statistical Sciences and Epidemiology Division, The Nathan S. Kline Institute for Psychiatric Research, 140 Orangeburg Road, Orangeburg, NY, USA.

出版信息

Stat Med. 2003 Nov 15;22(21):3403-17. doi: 10.1002/sim.1640.

Abstract

The capture-recapture approach to estimating the size of a population is a well-studied area of statistics. The number of distinct individuals, N(A) and N(B), on each of two lists, A and B, and the number common to both lists, N(AB), are used to form an estimate of the binomial probability of being on one of the lists, which then allows an estimate to be made of the size of the population. Critical to the method is an accurate count of N(AB). We consider situations in which this count is not available. Such problems arise in a variety of behavioural health contexts in which the need for protection of privacy may prevent sharing identifying information, so it is not possible to specifically match an individual who appears on one list with an individual on the other. Suppose that the birth dates and/or other demographics of individuals on each list are known. We introduce two methods for estimating the duplication rates and the size of the population. Conditioning on the set beta of birth dates of those on list B, N(A) and N(B), the maximum likelihood estimators (MLEs) and their variance are derived. The MLEs are based on the proportion of individuals on list A whose birth dates fall in beta. This approach is particularly useful if list B itself contains duplicates. The second model utilizes the full sample distribution of the birth dates. We generalize this approach to accommodate multiple demographic characteristics. The approaches are applied to the problem of estimating duplication rates and the population size of veterans who have mental illness in Kings County, NY. The data are lists of those receiving service from the Veterans Administration system and from providers funded or certified by the New York State Office of Mental Health.

摘要

用于估计种群规模的捕获再捕获方法是统计学中一个经过充分研究的领域。两个列表A和B上各自不同个体的数量N(A)和N(B),以及两个列表共有的个体数量N(AB),被用于形成对处于其中一个列表的二项概率的估计,进而可以对种群规模进行估计。该方法的关键在于对N(AB)的准确计数。我们考虑无法获得该计数的情况。此类问题出现在各种行为健康背景中,在这些背景下,出于隐私保护的需要可能会阻止共享识别信息,因此无法将出现在一个列表上的个体与另一个列表上的个体进行具体匹配。假设每个列表上个体的出生日期和/或其他人口统计学特征是已知的。我们介绍两种估计重复率和种群规模的方法。以列表B上个体的出生日期集合β为条件,推导了N(A)和N(B)的最大似然估计量(MLEs)及其方差。MLEs基于列表A上出生日期落在β中的个体比例。如果列表B本身包含重复项,这种方法特别有用。第二个模型利用出生日期的全样本分布。我们对这种方法进行推广以适应多种人口统计学特征。这些方法被应用于估计纽约州金斯县患有精神疾病的退伍军人的重复率和种群规模问题。数据是从退伍军人管理系统以及从由纽约州心理健康办公室资助或认证的提供者处接受服务的人员列表。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验