Statistics Department, Tel Aviv University, Ramat Aviv 6997801, Israel; Computer Science Department, Technion - Israel Institute of Technology, Haifa 3200003, Israel.
Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90405, USA.
Am J Hum Genet. 2018 Jul 5;103(1):89-99. doi: 10.1016/j.ajhg.2018.06.002.
Methods that estimate SNP-based heritability and genetic correlations from genome-wide association studies have proven to be powerful tools for investigating the genetic architecture of common diseases and exposing unexpected relationships between disorders. Many relevant studies employ a case-control design, yet most methods are primarily geared toward analyzing quantitative traits. Here we investigate the validity of three common methods for estimating SNP-based heritability and genetic correlation between diseases. We find that the phenotype-correlation-genotype-correlation (PCGC) approach is the only method that can estimate both quantities accurately in the presence of important non-genetic risk factors, such as age and sex. We extend PCGC to work with arbitrary genetic architectures and with summary statistics that take the case-control sampling into account, and we demonstrate that our new method, PCGC-s, accurately estimates both SNP-based heritability and genetic correlations and can be applied to large datasets without requiring individual-level genotypic or phenotypic information. Finally, we use PCGC-s to estimate the genetic correlation between schizophrenia and bipolar disorder and demonstrate that previous estimates are biased, partially due to incorrect handling of sex as a strong risk factor.
方法,估计 SNP 为基础的遗传率和遗传相关性从全基因组关联研究已经被证明是强大的工具,用于调查常见疾病的遗传结构和暴露之间的意外关系障碍。许多相关的研究采用病例对照设计,但大多数方法主要针对分析数量性状。在这里,我们调查三种常见的方法,用于估计 SNP 为基础的遗传率和疾病之间的遗传相关性的有效性。我们发现表型相关基因型相关(PCGC)的方法是唯一的方法,可以估计两个数量准确存在重要的非遗传危险因素,如年龄和性别。我们扩展 PCGC 工作与任意遗传结构和与总结统计,考虑到病例对照抽样,我们证明我们的新方法,PCGC-s,准确估计 SNP 为基础的遗传率和遗传相关性,可以应用于大型数据集,而不需要个人水平的基因型或表型信息。最后,我们使用 PCGC-s 估计精神分裂症和双相情感障碍之间的遗传相关性,并证明以前的估计是有偏差的,部分原因是不正确的处理性别作为一个强有力的风险因素。