Zhou Jin J, Hu Tao, Qiao Dandi, Cho Michael H, Zhou Hua
Division of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, Arizona 85724
Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695.
Genetics. 2016 Nov;204(3):921-931. doi: 10.1534/genetics.116.190454. Epub 2016 Sep 19.
Single nucleotide polymorphism (SNP) set tests have been a powerful method in analyzing next-generation sequencing (NGS) data. The popular sequence kernel association test (SKAT) method tests a set of variants as random effects in the linear mixed model setting. Its P-value is calculated based on asymptotic theory that requires a large sample size. Therefore, it is known that SKAT is conservative and can lose power at small or moderate sample sizes. Given the current cost of sequencing technology, scales of NGS are still limited. In this report, we derive and implement computationally efficient, exact (nonasymptotic) score (eScore), likelihood ratio (eLRT), and restricted likelihood ratio (eRLRT) tests, ExactVCTest, that can achieve high power even when sample sizes are small. We perform simulation studies under various genetic scenarios. Our ExactVCTest (i.e., eScore, eLRT, eRLRT) exhibits well-controlled type I error. Under the alternative model, eScore P-values are universally smaller than those from SKAT. eLRT and eRLRT demonstrate significantly higher power than eScore, SKAT, and SKAT optimal (SKAT-o) across all scenarios and various samples sizes. We applied these tests to an exome sequencing study. Our findings replicate previous results and shed light on rare variant effects within genes. The software package is implemented in the open source, high-performance technical computing language Julia, and is freely available at https://github.com/Tao-Hu/VarianceComponentTest.jl Analysis of each trait in the exome sequencing data set with 399 individuals and 16,619 genes takes around 1 min on a desktop computer.
单核苷酸多态性(SNP)集检验一直是分析下一代测序(NGS)数据的有力方法。流行的序列核关联检验(SKAT)方法在线性混合模型设置中把一组变异作为随机效应进行检验。其P值是基于需要大样本量的渐近理论计算得出的。因此,已知SKAT是保守的,在小样本或中等样本量时会失去检验效能。鉴于当前测序技术的成本,NGS的规模仍然有限。在本报告中,我们推导并实现了计算高效的精确(非渐近)得分(eScore)、似然比(eLRT)和受限似然比(eRLRT)检验,即ExactVCTest,即使在样本量较小时也能实现高检验效能。我们在各种遗传场景下进行了模拟研究。我们的ExactVCTest(即eScore、eLRT、eRLRT)表现出良好控制的I型错误。在备择模型下,eScore的P值普遍小于SKAT的P值。在所有场景和各种样本量下,eLRT和eRLRT的检验效能显著高于eScore、SKAT和SKAT最优检验(SKAT-o)。我们将这些检验应用于一项外显子组测序研究。我们的发现重复了先前的结果,并揭示了基因内罕见变异的效应。该软件包是用开源的高性能技术计算语言Julia实现的,可在https://github.com/Tao-Hu/VarianceComponentTest.jl上免费获取。在一台台式计算机上,对包含399个个体和16,619个基因的外显子组测序数据集中的每个性状进行分析大约需要1分钟。