Thomas D C
Environ Health Perspect. 1985 Oct;62:407-14. doi: 10.1289/ehp.8562407.
Point-source environmental hazards are often identified by examination of unusual clusters of disease cases. The very large number of potential clusters give rise to the statistical problem of "multiple inference," i.e., the more clusters examined, the greater the risk of "false-positive" associations emerging by chance alone. This paper first distinguishes the situation of clusters identified by anecdotal observation from those that emerge from systematic searches. The latter may or may not include a systematic enumeration of potential causal factors associated with each potential disease cluster. If exposure information is not systematically available, empirical Bayes procedures are suggested as a basis for ranking the observed clusters in order of priority for further investigation. If exposure information is systematically available, empirical Bayes procedures can be used to select associations to report or to rank them in order of priority for confirmation. In addition, procedures are described for testing the global null hypothesis of no exposure-disease associations and for estimating the number of true-positive associations. These approaches are advocated in preference to classical frequentist approaches of multiplying p values by the number of tests performed.
点源环境危害通常通过对异常疾病病例集群的检查来识别。大量潜在的集群引发了“多重推断”的统计问题,即检查的集群越多,仅因偶然出现“假阳性”关联的风险就越大。本文首先区分了通过轶事观察识别的集群情况与通过系统搜索出现的集群情况。后者可能包括也可能不包括对与每个潜在疾病集群相关的潜在因果因素的系统枚举。如果暴露信息无法系统获取,建议采用经验贝叶斯程序作为对观察到的集群按优先级排序以便进一步调查的基础。如果暴露信息可以系统获取,经验贝叶斯程序可用于选择要报告的关联或按优先级对其进行排序以便确认。此外,还描述了用于检验无暴露 - 疾病关联的总体零假设以及估计真阳性关联数量的程序。提倡采用这些方法,而不是通过将p值乘以所执行测试的数量的经典频率主义方法。