Department of Epidemiology, Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA.
Bioinformatics. 2012 Jul 1;28(13):1729-37. doi: 10.1093/bioinformatics/bts259. Epub 2012 May 3.
The question of how to best use information from known associated variants when conducting disease association studies has yet to be answered. Some studies compute a marginal P-value for each Several Nucleotide Polymorphisms independently, ignoring previously discovered variants. Other studies include known variants as covariates in logistic regression, but a weakness of this standard conditioning strategy is that it does not account for disease prevalence and non-random ascertainment, which can induce a correlation structure between candidate variants and known associated variants even if the variants lie on different chromosomes. Here, we propose a new conditioning approach, which is based in part on the classical technique of liability threshold modeling. Roughly, this method estimates model parameters for each known variant while accounting for the published disease prevalence from the epidemiological literature.
We show via simulation and application to empirical datasets that our approach outperforms both the no conditioning strategy and the standard conditioning strategy, with a properly controlled false-positive rate. Furthermore, in multiple data sets involving diseases of low prevalence, standard conditioning produces a severe drop in test statistics whereas our approach generally performs as well or better than no conditioning. Our approach may substantially improve disease gene discovery for diseases with many known risk variants.
LTSOFT software is available online http://www.hsph.harvard.edu/faculty/alkes-price/software/.
如何在进行疾病关联研究时最好地利用已知关联变体的信息,这个问题尚未得到解答。一些研究独立计算每个单核苷酸多态性的边际 P 值,忽略了先前发现的变体。其他研究将已知变体作为逻辑回归中的协变量,但这种标准条件策略的一个弱点是,它没有考虑疾病的流行率和非随机确定,即使变体位于不同的染色体上,这也会在候选变体和已知关联变体之间产生相关结构。在这里,我们提出了一种新的条件处理方法,该方法部分基于 Liability Threshold Modeling 的经典技术。大致来说,这种方法在考虑来自流行病学文献的已发表疾病流行率的同时,为每个已知变体估计模型参数。
通过模拟和应用于经验数据集,我们表明,我们的方法在适当控制假阳性率的情况下,优于无条件策略和标准条件策略。此外,在涉及低流行率疾病的多个数据集中,标准条件处理会导致检验统计量严重下降,而我们的方法通常表现得与无条件处理一样好或更好。对于具有许多已知风险变体的疾病,我们的方法可能会大大改善疾病基因发现。
LTSOFT 软件可在线获得,网址为:http://www.hsph.harvard.edu/faculty/alkes-price/software/。