Johnston Henry Richard, Hu Yijuan, Cutler David J
Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America.
Genet Epidemiol. 2015 Mar;39(3):145-8. doi: 10.1002/gepi.21881. Epub 2015 Jan 12.
Geneticists have, for years, understood the nature of genome-wide association studies using common genomic variants. Recently, however, focus has shifted to the analysis of rare variants. This presents potential problems for researchers, as rare variants do not always behave in the same way common variants do, sometimes rendering decades of solid intuition moot. In this paper, we present examples of the differences between common and rare variants. We show why one must be significantly more careful about the origin of rare variants, and how failing to do so can lead to highly inflated type I error. We then explain how to best avoid such concerns with careful understanding and study design. Additionally, we demonstrate that a seemingly low error rate in next-generation sequencing can dramatically impact the false-positive rate for rare variants. This is due to the fact that rare variants are, by definition, seen infrequently, making it hard to distinguish between errors and real variants. Compounding this problem is the fact that the proportion of errors is likely to get worse, not better, with increasing sample size. One cannot simply scale their way up in order to solve this problem. Understanding these potential pitfalls is a key step in successfully identifying true associations between rare variants and diseases.
多年来,遗传学家已经了解了使用常见基因组变异进行全基因组关联研究的本质。然而,最近研究重点已转向对罕见变异的分析。这给研究人员带来了潜在问题,因为罕见变异的行为方式并不总是与常见变异相同,有时会使数十年的可靠直觉变得毫无意义。在本文中,我们展示了常见变异与罕见变异之间差异的实例。我们说明了为何必须更加谨慎地对待罕见变异的来源,以及不这样做如何会导致I型错误率大幅虚增。然后我们解释了如何通过仔细理解和研究设计来最好地避免此类问题。此外,我们证明了下一代测序中看似较低的错误率会极大地影响罕见变异的假阳性率。这是因为根据定义,罕见变异出现频率很低,难以区分错误和真实变异。使这个问题更加复杂的是,随着样本量增加,错误比例可能会变得更糟,而不是更好。人们不能简单地通过扩大规模来解决这个问题。了解这些潜在陷阱是成功识别罕见变异与疾病之间真正关联的关键一步。