Department of Biostatistics, Yale University School of Public Health, New Haven, Connecticut.
Genet Epidemiol. 2020 Nov;44(8):934-947. doi: 10.1002/gepi.22350. Epub 2020 Aug 17.
In genome-wide association studies, signals associated with rare variants and interactions between genes are hard to detect even when the sample size is in tens of thousands. To overcome these problems, we examine the concept of supervariant. Like the classic concept of the gene, a supervariant is a combination of alleles in multiple loci, but the contributing loci can be anywhere in the genome. We hypothesize that supervariants are easy to detect and the aggregated signals are more stable in their associations with the disease than that from a single nucleoid polymorphism. Using the UK Biobank databases, we develop a ranking and aggregation method for identifying supervariants. Specifically, we examine 9,377 breast cancer cases with 46,861 controls matched by sex and age. In our simulations, the use of supervariants outperforms single-nucleotide polymorphism-based association method in detecting rare variants and signals with interactive structure. In real data analysis, we identify supervariants on Chromosomes 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 16, and 22 which cover previously reported loci that have associations with breast or other cancers, and several novel loci on Chromosomes 2, 5, 9, and 12. These findings demonstrate the validity of supervariants and its potential of discovering replicable and novel results for complex disease.
在全基因组关联研究中,即使样本量达到数万,也很难检测到与罕见变异和基因相互作用相关的信号。为了克服这些问题,我们研究了超级变体的概念。与经典的基因概念一样,超级变体是多个基因座中等位基因的组合,但贡献的基因座可以位于基因组的任何地方。我们假设超级变体易于检测,并且与疾病的关联的聚合信号比单一核碱基多态性更稳定。使用英国生物库数据库,我们开发了一种识别超级变体的排名和聚合方法。具体来说,我们检查了 9377 例乳腺癌病例和 46861 例性别和年龄匹配的对照。在我们的模拟中,使用超级变体在检测罕见变异和具有交互结构的信号方面优于基于单核苷酸多态性的关联方法。在真实数据分析中,我们在染色体 1、2、3、5、6、7、8、9、10、11、16 和 22 上识别出超级变体,这些超级变体涵盖了先前报道的与乳腺癌或其他癌症相关的基因座,以及染色体 2、5、9 和 12 上的几个新基因座。这些发现证明了超级变体的有效性及其发现复杂疾病可重复和新颖结果的潜力。