Lee Soyoun, Yang Jie, Huang Jiayu, Chen Hao, Hou Wei, Wu Song
Department of Pediatric Oncology and the Linde Program in Cancer Chemical Biology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts, USA.
Department of Public Health and General Medicine, School of Integrated Traditional and Western Medicine, Anhui University of Chinese Medicine , Hefei, China.
Brief Bioinform. 2017 Mar 1;18(2):195-204. doi: 10.1093/bib/bbw006.
Single nucleotide polymorphisms (SNPs), the most common genetic markers in genome-wide association studies, are usually in linkage disequilibrium (LD) with each other within a small genomic region. Both single- and two-marker-based LD mapping methods have been developed by taking advantage of the LD structures. In this study, a more general LD mapping framework with an arbitrary number of markers has been developed to further improve LD mapping and its detection power. This method is referred as multi-marker linkage disequilibrium mapping (mmLD). For the parameter estimation, we implemented a two-phase estimation procedure: first, haplotype frequencies were estimated for known markers; then, haplotype frequencies were updated to include the unknown quantitative trait loci based on estimates from the first step. For the hypothesis testing, we proposed a novel sequential likelihood ratio test procedure, which iteratively removed haplotypes with zero frequency and subsequently determined the proper degree of freedom. To compare the proposed mmLD method with other existing mapping methods, e.g. the adjusted single-marker LD mapping and the SKAT_C, we performed extensive simulations under various scenarios. The simulation results demonstrated that the mmLD has the same or higher power than the existing methods, while maintaining the correct type I errors. We further applied the mmLD to a public data set, 'GAW17', to investigate its applicability. The result showed the good performance of mmLD. We concluded that this improved mmLD method will be useful for future genome-wide association studies and genetic association analyses.
单核苷酸多态性(SNPs)是全基因组关联研究中最常见的遗传标记,通常在一个小的基因组区域内彼此处于连锁不平衡(LD)状态。利用LD结构,已经开发了基于单标记和双标记的LD作图方法。在本研究中,开发了一个更通用的具有任意数量标记的LD作图框架,以进一步改进LD作图及其检测能力。该方法被称为多标记连锁不平衡作图(mmLD)。对于参数估计,我们实施了一个两阶段估计程序:首先,估计已知标记的单倍型频率;然后,根据第一步的估计结果更新单倍型频率,以纳入未知的数量性状位点。对于假设检验,我们提出了一种新颖的序贯似然比检验程序,该程序迭代地去除零频率的单倍型,随后确定适当的自由度。为了将所提出的mmLD方法与其他现有作图方法(例如调整后的单标记LD作图和SKAT_C)进行比较,我们在各种情况下进行了广泛的模拟。模拟结果表明,mmLD与现有方法具有相同或更高的检验效能,同时保持正确的I型错误率。我们进一步将mmLD应用于一个公共数据集“GAW17”,以研究其适用性。结果显示mmLD具有良好的性能。我们得出结论,这种改进的mmLD方法将对未来的全基因组关联研究和遗传关联分析有用。