Wen Yalu, Burt Alexandra, Lu Qing
Institute of Cancer Stem Cell, Dalian Medical University, Liaoning, 116044, China
Department of Statistics, University of Auckland, 1010, New Zealand.
Genetics. 2017 Sep;207(1):63-73. doi: 10.1534/genetics.117.199752. Epub 2017 Jul 5.
Family-based design is one of the most popular designs in genetic studies and has many unique features for risk-prediction research. It is robust against genetic heterogeneity, and the relatedness among family members can be informative for predicting an individual's risk for disease with polygenic and shared environmental components of risk. Despite these strengths, family-based designs have been used infrequently in current risk-prediction studies, and their related statistical methods have not been well developed. In this article, we developed a generalized random field (GRF) method for family-based risk-prediction modeling on sequencing data. In GRF, subjects' phenotypes are viewed as stochastic realizations of a random field in a space, and a subject's phenotype is predicted by adjacent subjects, where adjacencies between subjects are determined by their genetic and within-family similarities. Different from existing methods that adjust for familial correlations, the GRF uses this information to form surrogates to further improve prediction accuracy. It also uses within-family information to capture predictors (, rare mutations) that are homogeneous in families. Through simulations, we have demonstrated that the GRF method attained better performance than an existing method by considering additional information from family members and accounting for genetic heterogeneity. We further provided practical recommendations for designing family-based risk prediction studies. Finally, we illustrated the GRF method with an application to a whole-genome exome data set from the Michigan State University Twin Registry study.
基于家系的设计是基因研究中最常用的设计之一,对于风险预测研究具有许多独特的特征。它对基因异质性具有稳健性,家庭成员之间的亲缘关系对于预测个体患具有多基因和共享环境风险因素的疾病风险可能具有参考价值。尽管有这些优势,但基于家系的设计在当前的风险预测研究中使用并不频繁,其相关的统计方法也没有得到很好的发展。在本文中,我们开发了一种广义随机场(GRF)方法,用于对测序数据进行基于家系的风险预测建模。在GRF中,将受试者的表型视为空间中随机场的随机实现,并通过相邻受试者来预测某个受试者的表型,其中受试者之间的相邻关系由其基因和家庭内相似性确定。与现有的用于调整家族相关性的方法不同,GRF利用这些信息来形成替代指标,以进一步提高预测准确性。它还利用家庭内信息来捕获在家族中具有同质性的预测因子(如罕见突变)。通过模拟,我们证明了GRF方法通过考虑来自家庭成员的额外信息并考虑基因异质性,比现有方法具有更好的性能。我们还为设计基于家系的风险预测研究提供了实用建议。最后,我们通过应用密歇根州立大学双胞胎登记研究的全基因组外显子数据集来说明GRF方法。