Chen Rui, Wei Qiang, Zhan Xiaowei, Zhong Xue, Sutcliffe James S, Cox Nancy J, Cook Edwin H, Li Chun, Chen Wei, Li Bingshan
Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA.
Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA.
Bioinformatics. 2015 May 1;31(9):1452-9. doi: 10.1093/bioinformatics/btu860. Epub 2015 Jan 6.
A major focus of current sequencing studies for human genetics is to identify rare variants associated with complex diseases. Aside from reduced power of detecting associated rare variants, controlling for population stratification is particularly challenging for rare variants. Transmission/disequilibrium tests (TDT) based on family designs are robust to population stratification and admixture, and therefore provide an effective approach to rare variant association studies to eliminate spurious associations. To increase power of rare variant association analysis, gene-based collapsing methods become standard approaches for analyzing rare variants. Existing methods that extend this strategy to rare variants in families usually combine TDT statistics at individual variants and therefore lack the flexibility of incorporating other genetic models.
In this study, we describe a haplotype-based framework for group-wise TDT (gTDT) that is flexible to encompass a variety of genetic models such as additive, dominant and compound heterozygous (CH) (i.e. recessive) models as well as other complex interactions. Unlike existing methods, gTDT constructs haplotypes by transmission when possible and inherently takes into account the linkage disequilibrium among variants. Through extensive simulations we showed that type I error was correctly controlled for rare variants under all models investigated, and this remained true in the presence of population stratification. Under a variety of genetic models, gTDT showed increased power compared with the single marker TDT. Application of gTDT to an autism exome sequencing data of 118 trios identified potentially interesting candidate genes with CH rare variants.
We implemented gTDT in C++ and the source code and the detailed usage are available on the authors' website (https://medschool.vanderbilt.edu/cgg).
bingshan.li@vanderbilt.edu or wei.chen@chp.edu
Supplementary data are available at Bioinformatics online.
当前人类遗传学测序研究的一个主要重点是识别与复杂疾病相关的罕见变异。除了检测相关罕见变异的能力降低外,对于罕见变异而言,控制群体分层尤其具有挑战性。基于家系设计的传递/不平衡检验(TDT)对群体分层和混合具有稳健性,因此为消除虚假关联的罕见变异关联研究提供了一种有效方法。为了提高罕见变异关联分析的能力,基于基因的合并方法成为分析罕见变异的标准方法。将此策略扩展到家族中罕见变异的现有方法通常会合并各个变异的TDT统计量,因此缺乏纳入其他遗传模型的灵活性。
在本研究中,我们描述了一种基于单倍型的分组TDT(gTDT)框架,该框架具有灵活性,可涵盖多种遗传模型,如加性、显性和复合杂合(CH)(即隐性)模型以及其他复杂相互作用。与现有方法不同,gTDT在可能的情况下通过传递构建单倍型,并内在地考虑了变异之间的连锁不平衡。通过广泛的模拟,我们表明在所有研究模型下,对于罕见变异,I型错误都能得到正确控制,在存在群体分层的情况下也是如此。在各种遗传模型下与单标记TDT相比,gTDT显示出更高的检验效能。将gTDT应用于118个三联体的自闭症外显子组测序数据中识别出了具有CH罕见变异的潜在有趣候选基因。
我们用C++实现了gTDT,其源代码和详细用法可在作者网站(https://medschool.vanderbilt.edu/cgg)上获取。
bingshan.li@vanderbilt.edu或wei.chen@chp.edu
补充数据可在《生物信息学》在线获取。