Stram Daniel O
Department of Preventive Medicine, Keck School of Medicine, University of Southern California, 1540 Alcazar Street, Los Angeles, CA, 90032, USA.
Methods Mol Biol. 2017;1666:485-504. doi: 10.1007/978-1-4939-7274-6_24.
Haplotype analysis forms the basis of much of genetic association analysis using both related and unrelated individuals (we concentrate on unrelated). For example, haplotype analysis indirectly underlies the SNP imputation methods that are used for testing trait associations with known but unmeasured variants and for performing collaborative post-GWAS meta-analysis. This chapter is focused on the direct use of haplotypes in association testing. It reviews the rationale for haplotype-based association testing, discusses statistical issues related to haplotype uncertainty that affect the analysis, then gives practical guidance for testing haplotype-based associations with phenotype or outcome trait, first of candidate gene regions and then for the genome as a whole. Haplotypes are interesting for two reasons, first they may be in closer LD with a causal variant than any single measured SNP, and therefore may enhance the coverage value of the genotypes over single SNP analysis. Second, haplotypes may themselves be the causal variants of interest and some solid examples of this have appeared in the literature.This chapter discusses three possible approaches to incorporation of SNP haplotype analysis into generalized linear regression models: (1) a simple substitution method involving imputed haplotypes, (2) simultaneous maximum likelihood (ML) estimation of all parameters, including haplotype frequencies and regression parameters, and (3) a simplified approximation to full ML for case-control data.Examples of the various approaches for a haplotype analysis of a candidate gene are provided. We compare the behavior of the approximation-based methods and argue that in most instances the simpler methods hold up well in practice. We also describe the practical implementation of haplotype risk estimation genome-wide and discuss several shortcuts that can be used to speed up otherwise potentially very intensive computational requirements.
单倍型分析构成了使用相关个体和无关个体进行大量基因关联分析的基础(我们重点关注无关个体)。例如,单倍型分析间接构成了单核苷酸多态性(SNP)填充方法的基础,这些方法用于测试性状与已知但未测量的变异之间的关联,以及进行GWAS后协作性荟萃分析。本章重点关注在关联测试中直接使用单倍型。它回顾了基于单倍型的关联测试的基本原理,讨论了影响分析的与单倍型不确定性相关的统计问题,然后针对与候选基因区域的表型或结局性状进行基于单倍型的关联测试给出了实用指南,首先是针对候选基因区域,然后是针对整个基因组。单倍型之所以有趣有两个原因,首先,它们可能与因果变异处于比任何单个测量的SNP更紧密的连锁不平衡(LD)中,因此与单SNP分析相比,可能会提高基因型的覆盖价值。其次,单倍型本身可能就是感兴趣的因果变异,并且文献中已经出现了一些确凿的例子。本章讨论了将SNP单倍型分析纳入广义线性回归模型的三种可能方法:(1)一种涉及填充单倍型的简单替代方法,(2)对所有参数(包括单倍型频率和回归参数)进行同时最大似然(ML)估计,以及(3)针对病例对照数据对全ML的简化近似。提供了对候选基因进行单倍型分析的各种方法的示例。我们比较了基于近似方法的性能,并认为在大多数情况下,更简单的方法在实际应用中表现良好。我们还描述了全基因组单倍型风险估计的实际实现,并讨论了几种可用于加快原本可能非常密集的计算需求的捷径。