Cheng Yichen, Dai James Y, Kooperberg Charles
Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
Biostatistics. 2016 Apr;17(2):221-34. doi: 10.1093/biostatistics/kxv035. Epub 2015 Sep 28.
In the genomic era, group association tests are of great interest. Due to the overwhelming number of individual genomic features, the power of testing for association of a single genomic feature at a time is often very small, as are the effect sizes for most features. Many methods have been proposed to test association of a trait with a group of features within a functional unit as a whole, e.g. all SNPs in a gene, yet few of these methods account for the fact that generally a substantial proportion of the features are not associated with the trait. In this paper, we propose to model the association for each feature in the group as a mixture of features with no association and features with non-zero associations to explicitly account for the possibility that a fraction of features may not be associated with the trait while other features in the group are. The feature-level associations are first estimated by generalized linear models; the sequence of these estimated associations is then modeled by a hidden Markov chain. To test for global association, we develop a modified likelihood ratio test based on a log-likelihood function that ignores higher order dependency plus a penalty term. We derive the asymptotic distribution of the likelihood ratio test under the null hypothesis. Furthermore, we obtain the posterior probability of association for each feature, which provides evidence of feature-level association and is useful for potential follow-up studies. In simulations and data application, we show that our proposed method performs well when compared with existing group association tests especially when there are only few features associated with the outcome.
在基因组时代,群体关联测试备受关注。由于个体基因组特征数量众多,一次测试单个基因组特征的关联能力通常非常小,大多数特征的效应大小也是如此。已经提出了许多方法来测试一个性状与一个功能单元内一组特征的关联,例如一个基因中的所有单核苷酸多态性(SNP),然而这些方法中很少有考虑到这样一个事实,即通常很大一部分特征与该性状不相关。在本文中,我们建议将组内每个特征的关联建模为无关联特征和非零关联特征的混合,以明确考虑一部分特征可能与该性状不相关而组内其他特征相关的可能性。首先通过广义线性模型估计特征水平的关联;然后通过隐马尔可夫链对这些估计关联的序列进行建模。为了测试全局关联,我们基于忽略高阶依赖性的对数似然函数加上一个惩罚项开发了一种改进的似然比检验。我们推导了原假设下似然比检验的渐近分布。此外,我们获得了每个特征的关联后验概率,这为特征水平的关联提供了证据,并且对潜在的后续研究很有用。在模拟和数据应用中,我们表明,与现有的群体关联测试相比,我们提出的方法表现良好,特别是当只有少数特征与结果相关时。