Özbek Umut, Lin Hui-Min, Lin Yan, Weeks Daniel E, Chen Wei, Shaffer John R, Purcell Shaun M, Feingold Eleanor
Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York.
Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, New York.
Genet Epidemiol. 2018 Sep;42(6):539-550. doi: 10.1002/gepi.22132. Epub 2018 Jun 13.
In a genome-wide association study (GWAS), association between genotype and phenotype at autosomal loci is generally tested by regression models. However, X-chromosome data are often excluded from published analyses of autosomes because of the difference between males and females in number of X chromosomes. Failure to analyze X-chromosome data at all is obviously less than ideal, and can lead to missed discoveries. Even when X-chromosome data are included, they are often analyzed with suboptimal statistics. Several mathematically sensible statistics for X-chromosome association have been proposed. The optimality of these statistics, however, is based on very specific simple genetic models. In addition, while previous simulation studies of these statistics have been informative, they have focused on single-marker tests and have not considered the types of error that occur even under the null hypothesis when the entire X chromosome is scanned. In this study, we comprehensively tested several X-chromosome association statistics using simulation studies that include the entire chromosome. We also considered a wide range of trait models for sex differences and phenotypic effects of X inactivation. We found that models that do not incorporate a sex effect can have large type I error in some cases. We also found that many of the best statistics perform well even when there are modest deviations, such as trait variance differences between the sexes or small sex differences in allele frequencies, from assumptions.
在全基因组关联研究(GWAS)中,常通过回归模型来检验常染色体位点的基因型与表型之间的关联。然而,由于男性和女性X染色体数量存在差异,X染色体数据在已发表的常染色体分析中常常被排除。完全不分析X染色体数据显然不尽如人意,可能导致遗漏发现。即便纳入了X染色体数据,其分析往往也采用了次优统计方法。针对X染色体关联,已经提出了几种数学上合理的统计方法。然而,这些统计方法的最优性基于非常特定的简单遗传模型。此外,尽管之前对这些统计方法的模拟研究具有参考价值,但它们聚焦于单标记检验,且未考虑在扫描整个X染色体时即使在零假设下也会出现的误差类型。在本研究中,我们通过包含整条染色体的模拟研究全面检验了几种X染色体关联统计方法。我们还考虑了广泛的性状模型,以探讨X染色体失活的性别差异和表型效应。我们发现,在某些情况下,未纳入性别效应的模型可能会有较大的I型错误。我们还发现,即便存在适度偏差,比如两性之间的性状方差差异或等位基因频率的微小性别差异,偏离假设,许多最佳统计方法仍表现良好。