Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA.
Stat Med. 2011 Feb 28;30(5):560-8. doi: 10.1002/sim.3920. Epub 2011 Feb 3.
Public health practitioners are often called upon to make inference about a health indicator for a population at large when the sole available information are data gathered from a convenience sample, such as data gathered on visitors to a clinic. These data may be of the highest quality and quite extensive, but the biases inherent in a convenience sample preclude the legitimate use of powerful inferential tools that are usually associated with a random sample. In general, we know nothing about those who do not visit the clinic beyond the fact that they do not visit the clinic. An alternative is to take a random sample of the population. However, we show that this solution would be wasteful if it excluded the use of available information. Hence, we present a simple annealing methodology that combines a relatively small, and presumably far less expensive, random sample with the convenience sample. This allows us to not only take advantage of powerful inferential tools, but also provides more accurate information than that available from just using data from the random sample alone.
公共卫生从业人员在进行有关整个人群健康指标的推断时,往往只能使用从方便样本(例如从诊所来访者中收集的数据)中获取的单一信息。这些数据可能具有最高质量且相当广泛,但方便样本固有的偏差使得无法合理使用通常与随机样本相关联的强大推断工具。一般来说,除了那些不去诊所的人不去诊所之外,我们对他们一无所知。另一种选择是从人群中抽取随机样本。但是,如果排除了现有信息的使用,我们将证明这种解决方案是浪费的。因此,我们提出了一种简单的退火方法,将相对较小的、推测成本更低的随机样本与方便样本相结合。这不仅使我们能够利用强大的推理工具,而且比仅使用随机样本数据提供的信息更准确。