Province M A
Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri 63110, USA.
Adv Genet. 2001;42:499-514. doi: 10.1016/s0065-2660(01)42039-6.
As the preceding chapters illustrate, now that whole-genome scan analyses are becoming more common, there is considerable disagreement about the best way to balance between false positives and false negatives (traditionally called type I and type II errors in the statistical parlance). Type I and type II errors can be simultaneously controlled, if we are willing to let the sample size of analysis vary. This is the secret that Wald (1947) discovered in the 1940s that led to the theory of sequential sampling and was the inspiration for Newton Morton in developing the lod score method. We can exploit this idea further and capitalize on an old, but nearly forgotten theory: sequential multiple decision procedures (SMDP) (Bechhoffer, et al., 1968), which generalizes the standard "two-hypotheses" tests to consider multiple alternative hypotheses. Using this theory, we can develop a single, genome-wide test that simultaneously partitions all markers into "signal" and "noise" groups, with tight control over both type I and type II errors (Province, 2000). Conceiving this approach as an analysis tool for fixed sample designs (instead of a true sequential sampling scheme), we can let the data decide at which point we should move from the hypothesis generation phase of a genome scan (where multiple comparisons make the interpretation of p values and significance levels difficult and controversial), to a true hypothesis-testing phase (where the problem of multiple comparisons has been all but eliminated so that p values may be accepted at face value).
如前几章所述,鉴于全基因组扫描分析正变得越来越普遍,在如何最佳平衡假阳性和假阴性(在统计术语中传统上称为I型和II型错误)方面存在相当大的分歧。如果我们愿意让分析的样本量有所变化,那么I型和II型错误可以同时得到控制。这就是沃尔德(1947年)在20世纪40年代发现的秘密,它催生了序贯抽样理论,也是牛顿·莫顿开发对数优势计分法的灵感来源。我们可以进一步利用这一理念,并借助一个古老但几乎被遗忘的理论:序贯多重决策程序(SMDP)(贝乔弗等人,1968年),该理论将标准的“双假设”检验进行了推广,以考虑多个备择假设。利用这一理论,我们可以开发一种单一的全基因组检验方法,将所有标记同时划分为“信号”和“噪声”组,同时严格控制I型和II型错误(普罗文斯,2000年)。将这种方法视为固定样本设计的分析工具(而非真正的序贯抽样方案),我们可以让数据决定从基因组扫描的假设生成阶段(此时多重比较使得p值和显著性水平的解释既困难又有争议)过渡到真正的假设检验阶段(此时多重比较问题几乎已被消除,因此p值可以直接接受)的时机。