Bravo Héctor Corrada, Lee Kristine E, Klein Barbara E K, Klein Ronald, Iyengar Sudha K, Wahba Grace
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.
Proc Natl Acad Sci U S A. 2009 May 19;106(20):8128-33. doi: 10.1073/pnas.0902906106. Epub 2009 May 6.
We present a method for examining the relative influence of familial, genetic, and environmental covariate information in flexible nonparametric risk models. Our goal is investigating the relative importance of these three sources of information as they are associated with a particular outcome. To that end, we developed a method for incorporating arbitrary pedigree information in a smoothing spline ANOVA (SS-ANOVA) model. By expressing pedigree data as a positive semidefinite kernel matrix, the SS-ANOVA model is able to estimate a log-odds ratio as a multicomponent function of several variables: one or more functional components representing information from environmental covariates and/or genetic marker data and another representing pedigree relationships. We report a case study on models for retinal pigmentary abnormalities in the Beaver Dam Eye Study. Our model verifies known facts about the epidemiology of this eye lesion--found in eyes with early age-related macular degeneration--and shows significantly increased predictive ability in models that include all three of the genetic, environmental, and familial data sources. The case study also shows that models that contain only two of these data sources, that is, pedigree-environmental covariates, or pedigree-genetic markers, or environmental covariates-genetic markers, have comparable predictive ability, but less than the model with all three. This result is consistent with the notions that genetic marker data encode--at least in part--pedigree data, and that familial correlations encode shared environment data as well.
我们提出了一种方法,用于在灵活的非参数风险模型中检验家族性、遗传性和环境协变量信息的相对影响。我们的目标是研究这三种信息来源与特定结果相关时的相对重要性。为此,我们开发了一种方法,将任意家系信息纳入平滑样条方差分析(SS - ANOVA)模型。通过将家系数据表示为正定核矩阵,SS - ANOVA模型能够将对数优势比估计为几个变量的多分量函数:一个或多个功能分量表示来自环境协变量和/或遗传标记数据的信息,另一个表示家系关系。我们报告了一项关于比弗代姆眼研究中视网膜色素异常模型的案例研究。我们的模型验证了关于这种眼部病变流行病学的已知事实——在与年龄相关的早期黄斑变性患者的眼睛中发现——并且在包含遗传、环境和家族所有三个数据源模型中显示出显著提高的预测能力。该案例研究还表明,仅包含其中两个数据源的模型,即家系 - 环境协变量、或家系 - 遗传标记、或环境协变量 - 遗传标记,具有相当的预测能力,但低于包含所有三个数据源的模型。这一结果与以下观点一致:遗传标记数据至少部分编码了家系数据,并且家族相关性也编码了共享环境数据。