利用合并的基因型数据估计单倍型与疾病的关联。

Estimating haplotype-disease associations with pooled genotype data.

作者信息

Zeng D, Lin D Y

机构信息

Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599-7420, USA.

出版信息

Genet Epidemiol. 2005 Jan;28(1):70-82. doi: 10.1002/gepi.20040.

DOI:10.1002/gepi.20040

PMID:15558554

Abstract

The genetic dissection of complex human diseases requires large-scale association studies which explore the population associations between genetic variants and disease phenotypes. DNA pooling can substantially reduce the cost of genotyping assays in these studies, and thus enables one to examine a large number of genetic variants on a large number of subjects. The availability of pooled genotype data instead of individual data poses considerable challenges in the statistical inference, especially in the haplotype-based analysis because of increased phase uncertainty. Here we present a general likelihood-based approach to making inferences about haplotype-disease associations based on possibly pooled DNA data. We consider cohort and case-control studies of unrelated subjects, and allow arbitrary and unequal pool sizes. The phenotype can be discrete or continuous, univariate or multivariate. The effects of haplotypes on disease phenotypes are formulated through flexible regression models, which allow a variety of genetic hypotheses and gene-environment interactions. We construct appropriate likelihood functions for various designs and phenotypes, accommodating Hardy-Weinberg disequilibrium. The corresponding maximum likelihood estimators are approximately unbiased, normally distributed, and statistically efficient. We develop simple and efficient numerical algorithms for calculating the maximum likelihood estimators and their variances, and implement these algorithms in a freely available computer program. We assess the performance of the proposed methods through simulation studies, and provide an application to the Finland-United States Investigation of NIDDM Genetics Study. The results show that DNA pooling is highly efficient in studying haplotype-disease associations. As a by-product, this work provides valid and efficient methods for estimating haplotype-disease associations with unpooled DNA samples.

摘要

复杂人类疾病的基因剖析需要大规模关联研究，以探索基因变异与疾病表型之间的群体关联。DNA 池化可以大幅降低这些研究中基因分型检测的成本，从而使人们能够在大量受试者中检测大量基因变异。池化基因型数据而非个体数据的可用性在统计推断中带来了相当大的挑战，特别是在基于单倍型的分析中，因为相位不确定性增加。在此，我们提出一种基于似然性的通用方法，用于基于可能的池化 DNA 数据推断单倍型与疾病的关联。我们考虑对无亲缘关系的受试者进行队列研究和病例对照研究，并允许池大小任意且不相等。表型可以是离散的或连续的、单变量的或多变量的。单倍型对疾病表型的影响通过灵活的回归模型来表述，该模型允许各种遗传假设和基因 - 环境相互作用。我们为各种设计和表型构建合适的似然函数，同时考虑哈迪 - 温伯格不平衡。相应的最大似然估计量近似无偏、呈正态分布且具有统计效率。我们开发了简单高效的数值算法来计算最大似然估计量及其方差，并将这些算法实现在一个免费的计算机程序中。我们通过模拟研究评估所提出方法的性能，并将其应用于芬兰 - 美国非胰岛素依赖型糖尿病遗传学研究。结果表明，DNA 池化在研究单倍型与疾病的关联方面非常高效。作为一个副产品，这项工作为用未池化的 DNA 样本估计单倍型与疾病的关联提供了有效且高效的方法。