Suppr超能文献

一种基于单倍型的相关个体遗传关联研究的不完全数据拟似然方法。

An Incomplete-Data Quasi-likelihood Approach to Haplotype-Based Genetic Association Studies on Related Individuals.

作者信息

Wang Zuoheng, McPeek Mary Sara

机构信息

Department of Statistics, University of Chicago, Chicago, IL 60637 (E-mail:

出版信息

J Am Stat Assoc. 2009 Sep 1;104(487):1251-1260. doi: 10.1198/jasa.2009.tm08507.

Abstract

We propose an incomplete-data, quasi-likelihood framework, for estimation and score tests, which accommodates both dependent and partially-observed data. The motivation comes from genetic association studies, where we address the problems of estimating haplotype frequencies and testing association between a disease and haplotypes of multiple tightly-linked genetic markers, using case-control samples containing related individuals. We consider a more general setting in which the complete data are dependent with marginal distributions following a generalized linear model. We form a vector Z whose elements are conditional expectations of the elements of the complete-data vector, given selected functions of the incomplete data. Assuming that the covariance matrix of Z is available, we form an optimal linear estimating function based on Z, which we solve by an iterative method. This approach addresses key difficulties in the haplotype frequency estimation and testing problems in related individuals: (1) dependence that is known but can be complicated; (2) data that are incomplete for structural reasons, as well as possibly missing, with different amounts of information for different observations; (3) the need for computational speed in order to analyze large numbers of markers; (4) a well-established null model, but an alternative model that is unknown and is problematic to fully specify in related individuals. For haplotype analysis, we give sufficient conditions for consistency and asymptotic normality of the estimator and asymptotic χ(2) null distribution of the score test. We apply the method to test for association of haplotypes with alcoholism in the GAW 14 COGA data set.

摘要

我们提出了一种用于估计和得分检验的不完全数据准似然框架,该框架适用于相依数据和部分观测数据。其动机源于基因关联研究,在该研究中,我们使用包含相关个体的病例对照样本,来解决估计单倍型频率以及检验疾病与多个紧密连锁基因标记的单倍型之间关联的问题。我们考虑一种更一般的情形,即完整数据是相依的,其边际分布服从广义线性模型。我们构造一个向量Z,其元素是完整数据向量元素在给定不完全数据的选定函数条件下的条件期望。假设Z的协方差矩阵已知,我们基于Z构造一个最优线性估计函数,并通过迭代方法求解。这种方法解决了相关个体单倍型频率估计和检验问题中的关键难点:(1)已知但可能复杂的相依性;(2)由于结构原因数据不完全以及可能存在缺失,不同观测的信息量不同;(3)为分析大量标记需要计算速度;(4)有一个成熟的零模型,但替代模型未知且在相关个体中难以完全指定。对于单倍型分析,我们给出了估计量的一致性和渐近正态性以及得分检验的渐近χ²零分布的充分条件。我们将该方法应用于GAW 14 COGA数据集中单倍型与酒精中毒的关联性检验。

相似文献

引用本文的文献

本文引用的文献

6
Evaluating associations of haplotypes with traits.评估单倍型与性状之间的关联。
Genet Epidemiol. 2004 Dec;27(4):348-64. doi: 10.1002/gepi.20037.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验