Tsai Shin-Fu, Tung Chih-Wei, Tsai Chen-An, Liao Chen-Tuo
Department of Agronomy, National Taiwan University , Taipei, Taiwan .
J Comput Biol. 2017 Dec;24(12):1254-1264. doi: 10.1089/cmb.2017.0140. Epub 2017 Nov 3.
Genome-wide association studies (GWAS) have been a powerful tool for exploring potential relationships between single-nucleotide polymorphisms (SNPs) and biological traits. For screening out important genetic variants, it is desired to perform an exhaustive scan over a whole genome. However, this is usually a challenging and daunting task in computation, due mainly to the large number of SNPs in GWAS. In this article, we propose a computationally effective algorithm for highly homozygous genomes. Pseudo standard error (PSE) is known to be a highly efficient and robust estimator for the standard deviation of a quantitative trait. We thus develop a statistical testing procedure for determining significant SNP main effects and SNP × SNP interactions associated with a quantitative trait based on PSE. A simulation study is first conducted to evaluate its empirical size and power. It is shown that the proposed PSE-based method can generally maintain the empirical size sufficiently close to the nominal significance level. However, the power investigation indicates that the PSE-based method might lack power in identifying significant effects for low-frequency variants if their true effect sizes are not large enough. A software is provided for implementing the proposed algorithm and its computational efficiency is evaluated through another simulation study. An exhaustive scan is usually done within a very reasonable runtime and a rice genome data set is analyzed by the software.
全基因组关联研究(GWAS)一直是探索单核苷酸多态性(SNP)与生物学性状之间潜在关系的有力工具。为了筛选出重要的遗传变异,需要对整个基因组进行详尽扫描。然而,这在计算上通常是一项具有挑战性且艰巨的任务,主要原因是GWAS中存在大量的SNP。在本文中,我们针对高度纯合基因组提出了一种计算高效的算法。伪标准误差(PSE)已知是定量性状标准差的一种高效且稳健估计量。因此,我们基于PSE开发了一种统计检验程序,用于确定与定量性状相关的显著SNP主效应和SNP×SNP相互作用。首先进行了一项模拟研究以评估其经验显著性水平和检验效能。结果表明,所提出的基于PSE的方法通常能使经验显著性水平充分接近名义显著性水平。然而,检验效能研究表明,如果低频变异的真实效应大小不够大,基于PSE的方法在识别显著效应时可能缺乏检验效能。提供了一个软件来实现所提出的算法,并通过另一项模拟研究评估其计算效率。通常能在非常合理的运行时间内完成详尽扫描,且该软件对水稻基因组数据集进行了分析。