Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan.
Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, United States of America.
PLoS One. 2014 Jan 15;9(1):e85728. doi: 10.1371/journal.pone.0085728. eCollection 2014.
With the development of next-generation sequencing technology, there is a great demand for powerful statistical methods to detect rare variants (minor allele frequencies (MAFs)<1%) associated with diseases. Testing for each variant site individually is known to be underpowered, and therefore many methods have been proposed to test for the association of a group of variants with phenotypes, by pooling signals of the variants in a chromosomal region. However, this pooling strategy inevitably leads to the inclusion of a large proportion of neutral variants, which may compromise the power of association tests. To address this issue, we extend the [Formula: see text]-MidP method (Cheung et al., 2012, Genet Epidemiol 36: 675-685) and propose an approach (named 'adaptive combination of P-values for rare variant association testing', abbreviated as 'ADA') that adaptively combines per-site P-values with the weights based on MAFs. Before combining P-values, we first imposed a truncation threshold upon the per-site P-values, to guard against the noise caused by the inclusion of neutral variants. This ADA method is shown to outperform popular burden tests and non-burden tests under many scenarios. ADA is recommended for next-generation sequencing data analysis where many neutral variants may be included in a functional region.
随着下一代测序技术的发展,人们对强大的统计方法的需求越来越大,这些方法可以检测与疾病相关的罕见变异(次要等位基因频率 (MAF) <1%)。逐个测试每个变异位点的方法通常效力不足,因此已经提出了许多方法来通过在染色体区域中汇集变异的信号来测试一组变异与表型的关联。然而,这种汇集策略不可避免地会包含大量中性变异,这可能会影响关联测试的效力。为了解决这个问题,我们扩展了 [公式:见文本]-MidP 方法(Cheung 等人,2012,遗传流行病学 36:675-685),并提出了一种方法(称为“用于罕见变异关联测试的基于 MAF 的 P 值自适应组合”,简称“ADA”),该方法自适应地结合了基于 MAF 的每个位点的 P 值和权重。在合并 P 值之前,我们首先对每个位点的 P 值施加截断阈值,以防止包含中性变异引起的噪声。ADA 方法在许多情况下都优于流行的负担测试和非负担测试。在可能包含许多中性变异的功能区域的下一代测序数据分析中,建议使用 ADA 方法。