Department of Genetics, Institute for Quantitative Biomedical Sciences, Dartmouth Medical School, Lebanon, NH, USA.
BioData Min. 2012 Sep 26;5(1):15. doi: 10.1186/1756-0381-5-15.
Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci, heritability, and minor allele frequency). Such studies neglect to account for model architecture (i.e. the unique specification and arrangement of penetrance values comprising the genetic model), which alone can influence the detectability of a model. In order to design a simulation study which efficiently takes architecture into account, a reliable metric is needed for model selection.
We evaluate three metrics as predictors of relative model detection difficulty derived from previous works: (1) Penetrance table variance (PTV), (2) customized odds ratio (COR), and (3) our own Ease of Detection Measure (EDM), calculated from the penetrance values and respective genotype frequencies of each simulated genetic model. We evaluate the reliability of these metrics across three very different data search algorithms, each with the capacity to detect epistatic interactions. We find that a model's EDM and COR are each stronger predictors of model detection success than heritability.
This study formally identifies and evaluates metrics which quantify model detection difficulty. We utilize these metrics to intelligently select models from a population of potential architectures. This allows for an improved simulation study design which accounts for differences in detection difficulty attributed to model architecture. We implement the calculation and utilization of EDM and COR into GAMETES, an algorithm which rapidly and precisely generates pure, strict, n-locus epistatic models.
用于检测复杂遗传疾病关联的算法最初是使用模拟数据集进行评估的。典型的评估方法会改变影响底层模型正确检测的约束条件(即基因座数量、遗传力和次要等位基因频率)。此类研究忽略了模型结构(即构成遗传模型的易感性值的独特规范和排列),而模型结构本身就可以影响模型的可检测性。为了设计一种能够有效地考虑结构的模拟研究,需要一种可靠的模型选择指标。
我们评估了三种指标,作为从以前的工作中得出的相对模型检测难度的预测指标:(1)易感性表方差(PTV),(2)定制的优势比(COR),(3)我们自己的检测容易度度量(EDM),是从每个模拟遗传模型的易感性值和相应的基因型频率计算得出的。我们评估了这些指标在三种非常不同的数据搜索算法中的可靠性,每种算法都有检测上位性相互作用的能力。我们发现,一个模型的 EDM 和 COR 都是比遗传力更好的模型检测成功的预测指标。
本研究正式确定并评估了量化模型检测难度的指标。我们利用这些指标从潜在结构的模型群体中智能地选择模型。这允许对模拟研究设计进行改进,以考虑到模型结构导致的检测难度差异。我们将 EDM 和 COR 的计算和利用实现到 GAMETES 中,这是一种快速而准确地生成纯、严格、n 个基因座上位性模型的算法。