Qiao Dandi, Lange Christoph, Laird Nan M, Won Sungho, Hersh Craig P, Morrow Jarrett, Hobbs Brian D, Lutz Sharon M, Ruczinski Ingo, Beaty Terri H, Silverman Edwin K, Cho Michael H
Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.
Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America.
Genet Epidemiol. 2017 May;41(4):309-319. doi: 10.1002/gepi.22037. Epub 2017 Feb 13.
Whole-exome sequencing using family data has identified rare coding variants in Mendelian diseases or complex diseases with Mendelian subtypes, using filters based on variant novelty, functionality, and segregation with the phenotype within families. However, formal statistical approaches are limited. We propose a gene-based segregation test (GESE) that quantifies the uncertainty of the filtering approach. It is constructed using the probability of segregation events under the null hypothesis of Mendelian transmission. This test takes into account different degrees of relatedness in families, the number of functional rare variants in the gene, and their minor allele frequencies in the corresponding population. In addition, a weighted version of this test allows incorporating additional subject phenotypes to improve statistical power. We show via simulations that the GESE and weighted GESE tests maintain appropriate type I error rate, and have greater power than several commonly used region-based methods. We apply our method to whole-exome sequencing data from 49 extended pedigrees with severe, early-onset chronic obstructive pulmonary disease (COPD) in the Boston Early-Onset COPD study (BEOCOPD) and identify several promising candidate genes. Our proposed methods show great potential for identifying rare coding variants of large effect and high penetrance for family-based sequencing data. The proposed tests are implemented in an R package that is available on CRAN (https://cran.r-project.org/web/packages/GESE/).
利用家系数据进行的全外显子组测序,已在孟德尔疾病或具有孟德尔亚型的复杂疾病中识别出罕见的编码变异,其使用了基于变异新颖性、功能以及与家系中表型的分离情况的筛选方法。然而,正式的统计方法存在局限性。我们提出了一种基于基因的分离检验(GESE),用于量化筛选方法的不确定性。它是利用孟德尔遗传传递零假设下分离事件的概率构建的。该检验考虑了家系中不同程度的亲缘关系、基因中功能性罕见变异的数量及其在相应人群中的次要等位基因频率。此外,此检验的加权版本允许纳入额外的个体表型以提高统计效能。我们通过模拟表明,GESE和加权GESE检验维持了适当的I型错误率,并且比几种常用的基于区域的方法具有更高的效能。我们将我们的方法应用于波士顿早发性慢性阻塞性肺疾病(COPD)研究(BEOCOPD)中49个患有严重早发性COPD的扩展家系的全外显子组测序数据,并识别出几个有前景的候选基因。我们提出的方法在识别基于家系测序数据的大效应和高外显率的罕见编码变异方面显示出巨大潜力。所提出的检验在一个可从CRAN获取的R包中实现(https://cran.r-project.org/web/packages/GESE/)。