Suppr超能文献

用于检测罕见单倍型与常见疾病关联的库尔贝克-莱布勒散度

Kullback-Leibler divergence for detection of rare haplotype common disease association.

作者信息

Lin Shili

机构信息

Department of Statistics, The Ohio State University, Columbus, OH, USA.

出版信息

Eur J Hum Genet. 2015 Nov;23(11):1558-65. doi: 10.1038/ejhg.2015.25. Epub 2015 Mar 4.

Abstract

Rare haplotypes may tag rare causal variants of common diseases; hence, detection of such rare haplotypes may also contribute to our understanding of complex disease etiology. Because rare haplotypes frequently result from common single-nucleotide polymorphisms (SNPs), focusing on rare haplotypes is much more economical compared with using rare single-nucleotide variants (SNVs) from sequencing, as SNPs are available and 'free' from already amassed genome-wide studies. Further, associated haplotypes may shed light on the underlying disease causal mechanism, a feat unmatched by SNV-based collapsing methods. In recent years, data mining approaches have been adapted to detect rare haplotype association. However, as they rely on an assumed underlying disease model and require the specification of a null haplotype, results can be erroneous if such assumptions are violated. In this paper, we present a haplotype association method based on Kullback-Leibler divergence (hapKL) for case-control samples. The idea is to compare haplotype frequencies for the cases versus the controls by computing symmetrical divergence measures. An important property of such measures is that both the frequencies and logarithms of the frequencies contribute in parallel, thus balancing the contributions from rare and common, and accommodating both deleterious and protective, haplotypes. A simulation study under various scenarios shows that hapKL has well-controlled type I error rates and good power compared with existing data mining methods. Application of hapKL to age-related macular degeneration (AMD) shows a strong association of the complement factor H (CFH) gene with AMD, identifying several individual rare haplotypes with strong signals.

摘要

罕见单倍型可能标记常见疾病的罕见致病变异;因此,检测此类罕见单倍型也可能有助于我们理解复杂疾病的病因。由于罕见单倍型通常由常见的单核苷酸多态性(SNP)产生,与使用测序得到的罕见单核苷酸变异(SNV)相比,关注罕见单倍型要经济得多,因为SNP可从已积累的全基因组研究中获得且是“免费”的。此外,相关单倍型可能有助于揭示潜在的疾病致病机制,这是基于SNV的合并方法所无法比拟的。近年来,数据挖掘方法已被用于检测罕见单倍型关联。然而,由于它们依赖于假定的潜在疾病模型且需要指定一个空单倍型,如果这些假设不成立,结果可能会出错。在本文中,我们提出了一种基于Kullback-Leibler散度的病例对照样本单倍型关联方法(hapKL)。其思路是通过计算对称散度度量来比较病例组与对照组的单倍型频率。此类度量的一个重要特性是频率及其对数都并行起作用,从而平衡了罕见和常见单倍型的贡献,并兼顾了有害和保护性单倍型。在各种情况下的模拟研究表明,与现有的数据挖掘方法相比,hapKL具有良好控制的I型错误率和良好的检验效能。将hapKL应用于年龄相关性黄斑变性(AMD)显示,补体因子H(CFH)基因与AMD有很强的关联,识别出了几个具有强信号的个体罕见单倍型。

相似文献

本文引用的文献

9
Testing for an unusual distribution of rare variants.检测罕见变异的异常分布。
PLoS Genet. 2011 Mar;7(3):e1001322. doi: 10.1371/journal.pgen.1001322. Epub 2011 Mar 3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验