Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Rockville, Maryland, United States of America.
Genet Epidemiol. 2013 Nov;37(7):726-42. doi: 10.1002/gepi.21757.
Functional linear models are developed in this paper for testing associations between quantitative traits and genetic variants, which can be rare variants or common variants or the combination of the two. By treating multiple genetic variants of an individual in a human population as a realization of a stochastic process, the genome of an individual in a chromosome region is a continuum of sequence data rather than discrete observations. The genome of an individual is viewed as a stochastic function that contains both linkage and linkage disequilibrium (LD) information of the genetic markers. By using techniques of functional data analysis, both fixed and mixed effect functional linear models are built to test the association between quantitative traits and genetic variants adjusting for covariates. After extensive simulation analysis, it is shown that the F-distributed tests of the proposed fixed effect functional linear models have higher power than that of sequence kernel association test (SKAT) and its optimal unified test (SKAT-O) for three scenarios in most cases: (1) the causal variants are all rare, (2) the causal variants are both rare and common, and (3) the causal variants are common. The superior performance of the fixed effect functional linear models is most likely due to its optimal utilization of both genetic linkage and LD information of multiple genetic variants in a genome and similarity among different individuals, while SKAT and SKAT-O only model the similarities and pairwise LD but do not model linkage and higher order LD information sufficiently. In addition, the proposed fixed effect models generate accurate type I error rates in simulation studies. We also show that the functional kernel score tests of the proposed mixed effect functional linear models are preferable in candidate gene analysis and small sample problems. The methods are applied to analyze three biochemical traits in data from the Trinity Students Study.
本文发展了功能线性模型,用于检验数量性状与遗传变异之间的关联,这些遗传变异可以是罕见变异或常见变异,也可以是两者的组合。通过将人群中个体的多个遗传变异视为随机过程的实现,个体的基因组在染色体区域中是一个连续的序列数据,而不是离散的观察值。个体的基因组被视为包含遗传标记的连锁和连锁不平衡(LD)信息的随机函数。通过使用功能数据分析技术,构建了固定效应和混合效应功能线性模型,以调整协变量后检验数量性状与遗传变异之间的关联。经过广泛的模拟分析,结果表明,在所提出的固定效应功能线性模型的 F 分布检验中,在大多数情况下,对于三种情况(1)因果变异均为罕见,(2)因果变异均为罕见和常见,(3)因果变异均为常见,其功效均高于序列核关联检验(SKAT)及其最优统一检验(SKAT-O):(1)因果变异均为罕见,(2)因果变异均为罕见和常见,(3)因果变异均为常见。固定效应功能线性模型的优越性能很可能是由于其最佳利用了基因组中多个遗传变异的遗传连锁和 LD 信息以及不同个体之间的相似性,而 SKAT 和 SKAT-O 仅对相似性和成对 LD 进行建模,但对连锁和高阶 LD 信息建模不足。此外,在模拟研究中,所提出的固定效应模型产生了准确的Ⅰ型错误率。我们还表明,所提出的混合效应功能线性模型的功能核得分检验在候选基因分析和小样本问题中更优。该方法应用于分析 Trinity Students Study 数据中的三个生化特征。