Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA.
Genet Epidemiol. 2013 Feb;37(2):142-51. doi: 10.1002/gepi.21699. Epub 2012 Nov 26.
In the increasing number of sequencing studies aimed at identifying rare variants associated with complex traits, the power of the test can be improved by guided sampling procedures. We confirm both analytically and numerically that sampling individuals with extreme phenotypes can enrich the presence of causal rare variants and can therefore lead to an increase in power compared to random sampling. Although application of traditional rare variant association tests to these extreme phenotype samples requires dichotomizing the continuous phenotypes before analysis, the dichotomization procedure can decrease the power by reducing the information in the phenotypes. To avoid this, we propose a novel statistical method based on the optimal Sequence Kernel Association Test that allows us to test for rare variant effects using continuous phenotypes in the analysis of extreme phenotype samples. The increase in power of this method is demonstrated through simulation of a wide range of scenarios as well as in the triglyceride data of the Dallas Heart Study.
在越来越多的旨在识别与复杂性状相关的罕见变异的测序研究中,可以通过有针对性的采样程序来提高检验的功效。我们通过分析和数值模拟都证实,对具有极端表型的个体进行采样可以富集因果罕见变异的存在,因此与随机采样相比,可以提高检验功效。虽然将传统的罕见变异关联检验应用于这些极端表型样本时,需要在分析前将连续表型进行二分,但是二分过程会通过减少表型中的信息而降低检验功效。为避免这种情况,我们提出了一种基于最优序列核关联检验的新统计方法,该方法允许我们在分析极端表型样本时,使用连续表型来检验罕见变异的效应。通过模拟广泛的场景以及达拉斯心脏研究中的甘油三酯数据,证明了该方法提高了检验功效。