Li Jing, Zhou Yingyao, Elston Robert C
Electrical Engineering and Computer Science Department, Case Western Reserve University, Cleveland, OH 44106, USA.
BMC Bioinformatics. 2006 May 18;7:258. doi: 10.1186/1471-2105-7-258.
With the availability of large-scale, high-density single-nucleotide polymorphism (SNP) markers, substantial effort has been made in identifying disease-causing genes using linkage disequilibrium (LD) mapping by haplotype analysis of unrelated individuals. In addition to complex diseases, many continuously distributed quantitative traits are of primary clinical and health significance. However the development of association mapping methods using unrelated individuals for quantitative traits has received relatively less attention.
We recently developed an association mapping method for complex diseases by mining the sharing of haplotype segments (i.e., phased genotype pairs) in affected individuals that are rarely present in normal individuals. In this paper, we extend our previous work to address the problem of quantitative trait mapping from unrelated individuals. The method is non-parametric in nature, and statistical significance can be obtained by a permutation test. It can also be incorporated into the one-way ANCOVA (analysis of covariance) framework so that other factors and covariates can be easily incorporated. The effectiveness of the approach is demonstrated by extensive experimental studies using both simulated and real data sets. The results show that our haplotype-based approach is more robust than two statistical methods based on single markers: a single SNP association test (SSA) and the Mann-Whitney U-test (MWU). The algorithm has been incorporated into our existing software package called HapMiner, which is available from our website at http://www.eecs.case.edu/~jxl175/HapMiner.html.
For QTL (quantitative trait loci) fine mapping, to identify QTNs (quantitative trait nucleotides) with realistic effects (the contribution of each QTN less than 10% of total variance of the trait), large samples sizes (>or= 500) are needed for all the methods. The overall performance of HapMiner is better than that of the other two methods. Its effectiveness further depends on other factors such as recombination rates and the density of typed SNPs. Haplotype-based methods might provide higher power than methods based on a single SNP when using tag SNPs selected from a small number of samples or some other sources (such as HapMap data). Rank-based statistics usually have much lower power, as shown in our study.
随着大规模、高密度单核苷酸多态性(SNP)标记的出现,人们通过对无关个体进行单倍型分析,利用连锁不平衡(LD)图谱来识别致病基因,付出了巨大努力。除了复杂疾病外,许多连续分布的数量性状也具有重要的临床和健康意义。然而,利用无关个体进行数量性状关联图谱分析方法的发展相对较少受到关注。
我们最近开发了一种通过挖掘患病个体中正常个体很少出现的单倍型片段(即定相基因型对)的共享情况来进行复杂疾病关联图谱分析的方法。在本文中,我们扩展了之前的工作,以解决无关个体数量性状定位的问题。该方法本质上是非参数的,统计显著性可通过置换检验获得。它还可以纳入单向协方差分析(ANCOVA)框架,以便轻松纳入其他因素和协变量。通过使用模拟和真实数据集进行的广泛实验研究证明了该方法的有效性。结果表明,我们基于单倍型的方法比基于单个标记的两种统计方法更稳健:单SNP关联检验(SSA)和曼-惠特尼U检验(MWU)。该算法已纳入我们现有的名为HapMiner的软件包中,可从我们的网站http://www.eecs.case.edu/~jxl175/HapMiner.html获取。
对于数量性状基因座(QTL)精细定位,为了识别具有实际效应(每个QTN对性状总方差的贡献小于10%)的数量性状核苷酸(QTN),所有方法都需要大样本量(≥500)。HapMiner的整体性能优于其他两种方法。其有效性还进一步取决于其他因素,如重组率和分型SNP的密度。当使用从少量样本或其他来源(如HapMap数据)中选择的标签SNP时,基于单倍型的方法可能比基于单个SNP的方法具有更高的效能。如我们的研究所示,基于秩的统计通常效能要低得多。