连锁分析规划与评估的另一种基础。II. 多重检验校正的影响

An alternative foundation for the planning and evaluation of linkage analysis. II. Implications for multiple test adjustments.

作者信息

Strug Lisa J, Hodge Susan E

机构信息

Division of Statistical Genetics, Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, N.Y., USA.

出版信息

Hum Hered. 2006;61(4):200-9. doi: 10.1159/000094775. Epub 2006 Jul 27.

DOI:10.1159/000094775

PMID:16877867

Abstract

The 'multiple testing problem' currently bedevils the field of genetic epidemiology. Briefly stated, this problem arises with the performance of more than one statistical test and results in an increased probability of committing at least one Type I error. The accepted/conventional way of dealing with this problem is based on the classical Neyman-Pearson statistical paradigm and involves adjusting one's error probabilities. This adjustment is, however, problematic because in the process of doing that, one is also adjusting one's measure of evidence. Investigators have actually become wary of looking at their data, for fear of having to adjust the strength of the evidence they observed at a given locus on the genome every time they conduct an additional test. In a companion paper in this issue (Strug & Hodge I), we presented an alternative statistical paradigm, the 'evidential paradigm', to be used when planning and evaluating linkage studies. The evidential paradigm uses the lod score as the measure of evidence (as opposed to a p value), and provides new, alternatively defined error probabilities (alternative to Type I and Type II error rates). We showed how this paradigm separates or decouples the two concepts of error probabilities and strength of the evidence. In the current paper we apply the evidential paradigm to the multiple testing problem - specifically, multiple testing in the context of linkage analysis. We advocate using the lod score as the sole measure of the strength of evidence; we then derive the corresponding probabilities of being misled by the data under different multiple testing scenarios. We distinguish two situations: performing multiple tests of a single hypothesis, vs. performing a single test of multiple hypotheses. For the first situation the probability of being misled remains small regardless of the number of times one tests the single hypothesis, as we show. For the second situation, we provide a rigorous argument outlining how replication samples themselves (analyzed in conjunction with the original sample) constitute appropriate adjustments for conducting multiple hypothesis tests on a data set.

摘要

“多重检验问题”目前困扰着遗传流行病学领域。简而言之，这个问题出现在进行不止一项统计检验时，会导致至少犯一次I型错误的概率增加。处理这个问题的公认/传统方法基于经典的奈曼 - 皮尔逊统计范式，涉及调整错误概率。然而，这种调整存在问题，因为在这个过程中，人们也在调整证据的衡量标准。研究人员实际上已经对查看他们的数据变得谨慎，因为担心每次进行额外检验时都必须调整他们在基因组特定位点观察到的证据强度。在本期的一篇配套论文（Strug & Hodge I）中，我们提出了一种替代的统计范式，即“证据范式”，用于规划和评估连锁研究。证据范式使用对数优势比分（lod score）作为证据的衡量标准（与p值相对），并提供新的、另行定义的错误概率（替代I型和II型错误率）。我们展示了这种范式如何将错误概率和证据强度这两个概念分开或解耦。在当前论文中，我们将证据范式应用于多重检验问题——具体而言，是连锁分析背景下的多重检验。我们主张使用对数优势比分作为证据强度的唯一衡量标准；然后我们推导出在不同多重检验场景下被数据误导的相应概率。我们区分两种情况：对单个假设进行多次检验，与对多个假设进行单次检验。对于第一种情况，正如我们所展示的，无论对单个假设进行检验的次数多少，被误导的概率仍然很小。对于第二种情况，我们提供了一个严谨的论证，概述了复制样本本身（与原始样本一起分析）如何构成对数据集进行多个假设检验的适当调整。