单倍型分析中的多重检验再探讨：在病例对照数据中的应用

Multiple testing in the context of haplotype analysis revisited: application to case-control data.

作者信息

Becker T, Cichon S, Jönson E, Knapp M

机构信息

Institute for Medical Biometry, Informatics and Epidemiology, University of Bonn, Sigmund-Freud-Strasse 25, D-53105 Bonn, Germany.

出版信息

Ann Hum Genet. 2005 Nov;69(Pt 6):747-56. doi: 10.1111/j.1529-8817.2005.00198.x.

DOI:10.1111/j.1529-8817.2005.00198.x

PMID:16266412

Abstract

We have lately presented a testing procedure for family data which accounts for the multiple testing problem that is induced by the enormous number of different marker combinations that can be analyzed in a set of tightly linked markers. Most methods of haplotype based association analysis already require simulations to obtain an uncorrected P value for a specific marker combination. As shown before, it is nevertheless not necessary to carry out nested simulations to obtain a global P value that properly corrects for the multiple testing of different marker combinations without neglecting the dependency of the tests. We have now implemented this approach for case-control data in our program FAMHAP, as this data structure currently plays a dominant role in the field. We consider different ways to deal with phase ambiguities and two different statistical tests for the underlying single marker combinations to obtain uncorrected P values. One test statistic is chi-square based, the other is a haplotype trend regression. The performance of these different tests in the multiple testing situation is investigated in a large simulation study. We obtain a considerable gain in power with our global P values as opposed to Bonferroni corrected P values for all suggested test statistics. Good power was obtained both with the haplotype trend regression approach as well as with the simpler chi-square based test. Furthermore, we conclude that the better strategy to deal with phase ambiguities is to assign to each individual its list of weighted haplotype explanations, rather than to assign to each individual its most likely haplotype explanation. Finally, we demonstrate the usefulness of our approach by a real data example.

摘要

我们最近提出了一种针对家系数据的检验程序，该程序考虑了由于在一组紧密连锁的标记中可分析的大量不同标记组合而引发的多重检验问题。大多数基于单倍型的关联分析方法已经需要通过模拟来获得特定标记组合的未校正P值。如前所示，然而，无需进行嵌套模拟来获得一个全局P值，该全局P值能够在不忽略检验相关性的情况下，对不同标记组合的多重检验进行适当校正。我们现在已在我们的程序FAMHAP中针对病例对照数据实现了这种方法，因为这种数据结构目前在该领域占据主导地位。我们考虑了处理相位模糊性的不同方法以及针对基础单标记组合的两种不同统计检验，以获得未校正的P值。一种检验统计量基于卡方，另一种是单倍型趋势回归。在一项大型模拟研究中考察了这些不同检验在多重检验情况下的性能。与所有建议检验统计量的Bonferroni校正P值相比，我们的全局P值在功效上有显著提升。单倍型趋势回归方法以及更简单的基于卡方的检验都获得了良好的功效。此外，我们得出结论，处理相位模糊性的更好策略是为每个个体分配其加权单倍型解释列表，而不是为每个个体分配其最可能的单倍型解释。最后，我们通过一个实际数据示例展示了我们方法的实用性。