Queensland Institute of Medical Research, Brisbane, QLD 4072, Australia.
Genet Epidemiol. 2012 Apr;36(3):214-24. doi: 10.1002/gepi.21614.
Genome-wide association studies have facilitated the construction of risk predictors for disease from multiple Single Nucleotide Polymorphism markers. The ability of such "genetic profiles" to predict outcome is usually quantified in an independent data set. Coefficients of determination (R(2) ) have been a useful measure to quantify the goodness-of-fit of the genetic profile. Various pseudo-R(2) measures for binary responses have been proposed. However, there is no standard or consensus measure because the concept of residual variance is not easily defined on the observed probability scale. Unlike other nongenetic predictors such as environmental exposure, there is prior information on genetic predictors because for most traits there are estimates of the proportion of variation in risk in the population due to all genetic factors, the heritability. It is this useful ability to benchmark that makes the choice of a measure of goodness-of-fit in genetic profiling different from that of nongenetic predictors. In this study, we use a liability threshold model to establish the relationship between the observed probability scale and underlying liability scale in measuring R(2) for binary responses. We show that currently used R(2) measures are difficult to interpret, biased by ascertainment, and not comparable to heritability. We suggest a novel and globally standard measure of R(2) that is interpretable on the liability scale. Furthermore, even when using ascertained case-control studies that are typical in human disease studies, we can obtain an R(2) measure on the liability scale that can be compared directly to heritability.
全基因组关联研究促进了从多个单核苷酸多态性标记构建疾病风险预测因子。此类“遗传特征”预测结果的能力通常在独立数据集中进行量化。决定系数 (R²) 一直是量化遗传特征拟合优度的有用指标。已经提出了用于二分类响应的各种伪 R² 度量。然而,由于在观察到的概率尺度上不容易定义残差方差的概念,因此没有标准或共识的度量。与其他非遗传预测因子(如环境暴露)不同,遗传预测因子具有先验信息,因为对于大多数特征,由于所有遗传因素,人群中风险变异的比例都有估计值,即遗传力。正是这种有用的基准能力使得遗传分析中拟合优度度量的选择与非遗传预测因子的选择不同。在这项研究中,我们使用易感性阈值模型来建立二分类响应中观察到的概率尺度和潜在易感性尺度之间的关系,以测量 R²。我们表明,目前使用的 R² 度量难以解释,易受鉴定的影响,并且不能与遗传力相媲美。我们建议使用一种新颖的、全球标准的 R² 度量,该度量在易感性尺度上是可解释的。此外,即使使用在人类疾病研究中常见的确定病例对照研究,我们也可以获得易感性尺度上的 R² 度量,该度量可以直接与遗传力进行比较。