Torres David J
Department of Mathematics and Physical Science, Northern New Mexico College, Española, NM, USA.
Monte Carlo Methods Appl. 2020 Mar;26(1):17-32. doi: 10.1515/mcma-2020-2054. Epub 2020 Feb 5.
Ecological studies and epidemiology need to use group averaged data to make inferences about individual patterns. However, using correlations based on averages to estimate correlations of individual scores is subject to an "ecological fallacy". The purpose of this article is to create distributions of Pearson correlation values computed from grouped averaged or aggregate data using Monte Carlo simulations and random sampling. We show that, as the group size increases, the distributions can be approximated by a generalized hypergeometric distribution. The expectation of the constructed distribution slightly underestimates the individual Pearson value, but the difference becomes smaller as the number of groups increases. The approximate normal distribution resulting from Fisher's transformation can be used to build confidence intervals to approximate the Pearson value based on individual scores from the Pearson value based on the aggregated scores.
生态学研究和流行病学需要使用群体平均数据来推断个体模式。然而,使用基于平均值的相关性来估计个体分数的相关性容易出现“生态谬误”。本文的目的是通过蒙特卡罗模拟和随机抽样,创建从分组平均或汇总数据计算出的皮尔逊相关值的分布。我们表明,随着组大小的增加,这些分布可以用广义超几何分布来近似。构建分布的期望值略微低估了个体皮尔逊值,但随着组数的增加,差异会变小。由费舍尔变换得到的近似正态分布可用于构建置信区间,以根据汇总分数的皮尔逊值来近似基于个体分数的皮尔逊值。