使用多个样本数据集评估生态分析中的偏倚并进行调整。

Assessing and adjusting for bias in ecological analysis using multiple sample datasets.

作者信息

Li Qingfeng

机构信息

Department of International Health, Johns Hopkins Bloomberg School of Public Health, 615 North Wolfe Street, E-8136, Baltimore, MD, 21205, USA.

出版信息

BMC Med Res Methodol. 2025 Apr 24;25(1):112. doi: 10.1186/s12874-025-02552-y.

DOI:10.1186/s12874-025-02552-y

PMID:40275196

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12023363/

Abstract

BACKGROUND

Ecological analysis utilizes group-level aggregate measures to investigate the complex relationships between individuals or groups and their environment. Despite its extensive applications across various disciplines, this approach remains susceptible to several biases, including ecological fallacy.

METHODS

Our study identified another significant source of bias in ecological analysis when using multiple sample datasets, a common practice in fields such as public health and medical research. We show this bias is proportional to the sampling fraction used during data collection. We propose two adjustment methods to address this bias: one that directly accounts for the sampling fraction and another based on measurement error models. The effectiveness of these adjustments is evaluated through formal mathematical derivations, simulations, and empirical analysis using data from the 2014 Kenya Demographic and Health Survey.

RESULTS

Our findings reveal that the sampling fraction bias can lead to significant underestimation of true relationships when using aggregate measures from multiple sample datasets. Both adjustment methods effectively mitigate this bias, with the measurement-error-adjusted estimator showing particular robustness in real-world applications. The results highlight the importance of accounting for sampling fraction bias in ecological analyses to ensure accurate inference.

CONCLUSION

Beyond the ecological fallacy uncovered by Robinson's seminar work, our research identified another critical bias in ecological analysis that is likely just as prevalent and consequential. The proposed adjustment methods provide potential tools for researchers to adjust for this bias, thereby improving the validity of ecological inferences. This study underscores the need for caution when pooling aggregate measures from multiple sample datasets and offers potential solutions to enhance the reliability of ecological analyses in various research domains.

CLINICAL TRIAL NUMBER

Not applicable.

摘要

背景

生态分析利用群体层面的汇总指标来研究个体或群体与其环境之间的复杂关系。尽管该方法在各个学科中都有广泛应用，但这种方法仍然容易受到多种偏差的影响，包括生态谬误。

方法

我们的研究发现，在使用多个样本数据集时，生态分析中存在另一个重大偏差来源，这在公共卫生和医学研究等领域是一种常见做法。我们表明这种偏差与数据收集期间使用的抽样比例成正比。我们提出了两种调整方法来解决这种偏差：一种直接考虑抽样比例，另一种基于测量误差模型。通过正式的数学推导、模拟以及使用2014年肯尼亚人口与健康调查的数据进行实证分析，对这些调整的有效性进行了评估。

结果

我们的研究结果表明，当使用来自多个样本数据集的汇总指标时，抽样比例偏差可能导致对真实关系的严重低估。两种调整方法都有效地减轻了这种偏差，其中经测量误差调整的估计量在实际应用中表现出特别的稳健性。结果强调了在生态分析中考虑抽样比例偏差以确保准确推断的重要性。

结论

除了罗宾逊研讨会工作所揭示的生态谬误之外，我们的研究还发现了生态分析中另一个关键偏差，这种偏差可能同样普遍且具有重要影响。所提出的调整方法为研究人员提供了调整这种偏差的潜在工具，从而提高生态推断的有效性。本研究强调了在汇总来自多个样本数据集的汇总指标时需要谨慎，并提供了潜在的解决方案以提高各个研究领域中生态分析的可靠性。