Department of Epidemiology and Biostatistics, Indiana University at Bloomington, Indiana 47405.
Department of Biostatistics, Columbia University, New York, New York 10032.
Genetics. 2018 Oct;210(2):463-476. doi: 10.1534/genetics.118.301266. Epub 2018 Aug 13.
The genetic etiology of many complex diseases is highly heterogeneous. A complex disease can be caused by multiple mutations within the same gene or mutations in multiple genes at various genomic loci. Although these disease-susceptibility mutations can be collectively common in the population, they are often individually rare or even private to certain families. Family-based studies are powerful for detecting rare variants enriched in families, which is an important feature for sequencing studies due to the heterogeneous nature of rare variants. In addition, family designs can provide robust protection against population stratification. Nevertheless, statistical methods for analyzing family-based sequencing data are underdeveloped, especially those accounting for heterogeneous etiology of complex diseases. In this article, we introduce a random field framework for detecting gene-phenotype associations in family-based sequencing studies, referred to as family-based genetic random field (FGRF). Similar to existing family-based association tests, FGRF could utilize within-family and between-family information separately or jointly to test an association. We demonstrate that FGRF has comparable statistical power with existing methods when there is no genetic heterogeneity, but can improve statistical power when there is genetic heterogeneity across families. The proposed method also shares the same advantages with the conventional family-based association tests (, being robust to population stratification). Finally, we applied the proposed method to a sequencing data from the Minnesota Twin Family Study, and revealed several genes, including , potentially associated with alcohol dependence.
许多复杂疾病的遗传病因高度异质。一种复杂疾病可能是由同一基因内的多个突变或多个基因在不同基因组位置的突变引起的。尽管这些疾病易感性突变在人群中可能是共同的,但它们通常在个体中很少见,甚至是某些家庭特有的。基于家庭的研究对于检测在家庭中富集的罕见变异是很有力的,这是测序研究的一个重要特征,因为罕见变异的性质是异质的。此外,家庭设计可以为防止群体分层提供有力的保护。然而,分析基于家庭的测序数据的统计方法还不够发达,尤其是那些考虑到复杂疾病异质病因的方法。在本文中,我们引入了一种基于随机场的框架来检测基于家庭的测序研究中的基因-表型关联,称为基于家庭的遗传随机场(FGRF)。与现有的基于家庭的关联测试类似,FGRF 可以分别或联合利用家庭内和家庭间的信息来测试关联。我们证明,当没有遗传异质性时,FGRF 与现有的方法具有可比的统计功效,但当家庭之间存在遗传异质性时,它可以提高统计功效。所提出的方法还与传统的基于家庭的关联测试具有相同的优点(,对群体分层具有稳健性)。最后,我们将所提出的方法应用于明尼苏达州双胞胎家庭研究的测序数据,揭示了几个基因,包括 ,可能与酒精依赖有关。