School of Statistics, Shanxi University of Finance & Economics, Shanxi, China.
Laboratory of the Biology of Addictive Diseases, Rockefeller University, New York, US.
Sci Rep. 2019 Oct 29;9(1):15504. doi: 10.1038/s41598-019-51790-w.
It is extremely expensive to conduct large sample size array- or sequencing based genome scale association studies. For a quantitative trait, an extreme case-control study design may improve the power and reduce the cost of variant calling. We investigated the performance of extreme study design when various proportions of samples are selected from the tails of phenotype distribution. Using simulations, we show that when risk genotypes become rare in the population and effect size is relatively small, it is beneficial to carry out an extreme sampling study. In particular, the number of selected cases and controls can even be unbalanced such that power is further increased, compared with a balanced selection. Our application to two data sets: methadone dose data and yearling weight data, demonstrated that similar results for full data analysis can be obtained using extreme sampling with only a fraction of the data. Using power analysis with simulated data and an experimental data application, we conclude that when full data is unavailable due to restricted budget, it is rewarding to employ an extreme sampling design in the sense that there can be immense cost reductions and qualitatively similar power as in the full data analysis.
进行大规模的基于数组或测序的全基因组关联研究的费用非常高昂。对于一个定量性状,极端病例对照研究设计可以提高功效并降低变异呼叫的成本。我们研究了从表型分布尾部选择不同比例的样本时极端研究设计的性能。通过模拟,我们表明,当风险基因型在人群中变得罕见并且效应大小相对较小时,进行极端抽样研究是有益的。特别是,与平衡选择相比,选择的病例和对照的数量甚至可以不平衡,从而进一步提高功效。我们将其应用于两个数据集:美沙酮剂量数据和育肥牛体重数据,结果表明,仅使用部分数据进行极端抽样即可获得与全数据分析相似的结果。通过使用模拟数据和实验数据应用进行功效分析,我们得出结论,由于预算有限而无法获得全数据时,采用极端抽样设计是值得的,因为这样可以极大地降低成本,并且功效与全数据分析相似。