一种基于单倍型的相关个体遗传关联研究的不完全数据拟似然方法。

An Incomplete-Data Quasi-likelihood Approach to Haplotype-Based Genetic Association Studies on Related Individuals.

作者信息

Wang Zuoheng, McPeek Mary Sara

机构信息

Department of Statistics, University of Chicago, Chicago, IL 60637 (E-mail:

出版信息

J Am Stat Assoc. 2009 Sep 1;104(487):1251-1260. doi: 10.1198/jasa.2009.tm08507.

DOI:10.1198/jasa.2009.tm08507

PMID:20428335

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2860453/

Abstract

We propose an incomplete-data, quasi-likelihood framework, for estimation and score tests, which accommodates both dependent and partially-observed data. The motivation comes from genetic association studies, where we address the problems of estimating haplotype frequencies and testing association between a disease and haplotypes of multiple tightly-linked genetic markers, using case-control samples containing related individuals. We consider a more general setting in which the complete data are dependent with marginal distributions following a generalized linear model. We form a vector Z whose elements are conditional expectations of the elements of the complete-data vector, given selected functions of the incomplete data. Assuming that the covariance matrix of Z is available, we form an optimal linear estimating function based on Z, which we solve by an iterative method. This approach addresses key difficulties in the haplotype frequency estimation and testing problems in related individuals: (1) dependence that is known but can be complicated; (2) data that are incomplete for structural reasons, as well as possibly missing, with different amounts of information for different observations; (3) the need for computational speed in order to analyze large numbers of markers; (4) a well-established null model, but an alternative model that is unknown and is problematic to fully specify in related individuals. For haplotype analysis, we give sufficient conditions for consistency and asymptotic normality of the estimator and asymptotic χ(2) null distribution of the score test. We apply the method to test for association of haplotypes with alcoholism in the GAW 14 COGA data set.

摘要

我们提出了一种用于估计和得分检验的不完全数据准似然框架，该框架适用于相依数据和部分观测数据。其动机源于基因关联研究，在该研究中，我们使用包含相关个体的病例对照样本，来解决估计单倍型频率以及检验疾病与多个紧密连锁基因标记的单倍型之间关联的问题。我们考虑一种更一般的情形，即完整数据是相依的，其边际分布服从广义线性模型。我们构造一个向量Z，其元素是完整数据向量元素在给定不完全数据的选定函数条件下的条件期望。假设Z的协方差矩阵已知，我们基于Z构造一个最优线性估计函数，并通过迭代方法求解。这种方法解决了相关个体单倍型频率估计和检验问题中的关键难点：（1）已知但可能复杂的相依性；（2）由于结构原因数据不完全以及可能存在缺失，不同观测的信息量不同；（3）为分析大量标记需要计算速度；（4）有一个成熟的零模型，但替代模型未知且在相关个体中难以完全指定。对于单倍型分析，我们给出了估计量的一致性和渐近正态性以及得分检验的渐近χ²零分布的充分条件。我们将该方法应用于GAW 14 COGA数据集中单倍型与酒精中毒的关联性检验。

相似文献

An Incomplete-Data Quasi-likelihood Approach to Haplotype-Based Genetic Association Studies on Related Individuals.一种基于单倍型的相关个体遗传关联研究的不完全数据拟似然方法。

J Am Stat Assoc. 2009 Sep 1;104(487):1251-1260. doi: 10.1198/jasa.2009.tm08507.

Haplotype-based regression analysis and inference of case-control studies with unphased genotypes and measurement errors in environmental exposures.基于单倍型的回归分析以及对环境暴露中未分型基因型和测量误差的病例对照研究的推断。

Biometrics. 2008 Sep;64(3):673-684. doi: 10.1111/j.1541-0420.2007.00930.x. Epub 2007 Nov 12.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Quantifying the amount of missing information in genetic association studies.量化基因关联研究中缺失信息的数量。

Genet Epidemiol. 2006 Dec;30(8):703-17. doi: 10.1002/gepi.20181.

Assessment of linkage disequilibrium by the decay of haplotype sharing, with application to fine-scale genetic mapping.通过单倍型共享的衰减评估连锁不平衡及其在精细尺度基因定位中的应用。

Am J Hum Genet. 1999 Sep;65(3):858-75. doi: 10.1086/302537.

Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals.针对无关个体的病例对照研究，从基因型数据中对单倍型特异性相对风险进行建模和期望最大化（E-M）估计。

Hum Hered. 2003;55(4):179-90. doi: 10.1159/000073202.

Inference on haplotype effects in case-control studies using unphased genotype data.利用未分型基因型数据在病例对照研究中推断单倍型效应。

Am J Hum Genet. 2003 Dec;73(6):1316-29. doi: 10.1086/380204. Epub 2003 Nov 20.

Estimating population haplotype frequencies from pooled SNP data using incomplete database information.基于不完全的数据库信息，从合并的 SNP 数据中估计群体单体型频率。

Bioinformatics. 2009 Dec 15;25(24):3296-302. doi: 10.1093/bioinformatics/btp584. Epub 2009 Oct 27.

PoooL: an efficient method for estimating haplotype frequencies from large DNA pools.PoooL：一种从大型DNA混合样本中估计单倍型频率的有效方法。

Bioinformatics. 2008 Sep 1;24(17):1942-8. doi: 10.1093/bioinformatics/btn324. Epub 2008 Jun 23.

Multi-SNP Haplotype Analysis Methods for Association Analysis.用于关联分析的多单核苷酸多态性单倍型分析方法

Methods Mol Biol. 2017;1666:485-504. doi: 10.1007/978-1-4939-7274-6_24.

引用本文的文献

CERAMIC: Case-Control Association Testing in Samples with Related Individuals, Based on Retrospective Mixed Model Analysis with Adjustment for Covariates.CERAMIC：基于对协变量进行调整的回顾性混合模型分析，对有亲属关系个体的样本进行病例对照关联测试。

PLoS Genet. 2016 Oct 3;12(10):e1006329. doi: 10.1371/journal.pgen.1006329. eCollection 2016 Oct.

Mega2: validated data-reformatting for linkage and association analyses.Mega2：用于连锁和关联分析的经过验证的数据重新格式化工具。

Source Code Biol Med. 2014 Dec 5;9(1):26. doi: 10.1186/s13029-014-0026-y. eCollection 2014.

Statistical methods for genome-wide and sequencing association studies of complex traits in related samples.相关样本中复杂性状的全基因组和测序关联研究的统计方法。

Curr Protoc Hum Genet. 2015 Jan 20;84:1.28.1-1.28.9. doi: 10.1002/0471142905.hg0128s84.

Association analysis of complex diseases using triads, parent-child dyads and singleton monads.使用三联体、亲子二联体和单体对子分析复杂疾病。

BMC Genet. 2013 Sep 4;14:78. doi: 10.1186/1471-2156-14-78.

MASTOR: mixed-model association mapping of quantitative traits in samples with related individuals.MASTOR：用于相关个体样本中定量性状混合模型关联作图的方法。

Am J Hum Genet. 2013 May 2;92(5):652-66. doi: 10.1016/j.ajhg.2013.03.014.

BLUP genotype imputation for case-control association testing with related individuals and missing data.用于相关个体和缺失数据的病例对照关联测试的BLUP基因型填充

J Comput Biol. 2012 Jun;19(6):756-65. doi: 10.1089/cmb.2012.0024.

XM: association testing on the X-chromosome in case-control samples with related individuals.XM：在相关个体的病例对照样本中对 X 染色体进行关联分析。

Genet Epidemiol. 2012 Jul;36(5):438-50. doi: 10.1002/gepi.21638. Epub 2012 May 2.

A two-marker haplotype in the IRF5 gene is associated with inflammatory bowel disease in a North American cohort.IRF5 基因中的两个标记单倍型与北美队列中的炎症性肠病有关。

Genes Immun. 2012 Jun;13(4):351-5. doi: 10.1038/gene.2011.90. Epub 2012 Jan 19.

ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure.路途中的病例对照关联测试：具有部分或完全未知的群体和家系结构。

Am J Hum Genet. 2010 Feb 12;86(2):172-84. doi: 10.1016/j.ajhg.2010.01.001. Epub 2010 Feb 4.

ATRIUM: testing untyped SNPs in case-control association studies with related individuals.心房：在与相关个体的病例对照关联研究中测试无类型单核苷酸多态性。

Am J Hum Genet. 2009 Nov;85(5):667-78. doi: 10.1016/j.ajhg.2009.10.006.

本文引用的文献

Case-control association testing with related individuals: a more powerful quasi-likelihood score test.对相关个体进行病例对照关联检验：一种更强大的拟似然评分检验。

Am J Hum Genet. 2007 Aug;81(2):321-37. doi: 10.1086/519497. Epub 2007 Jul 10.

Quantifying the amount of missing information in genetic association studies.量化基因关联研究中缺失信息的数量。

Genet Epidemiol. 2006 Dec;30(8):703-17. doi: 10.1002/gepi.20181.

Description of the data from the Collaborative Study on the Genetics of Alcoholism (COGA) and single-nucleotide polymorphism genotyping for Genetic Analysis Workshop 14.COGA 协作研究以及遗传分析工作坊 14 的单核苷酸多态性基因分型的数据描述。

BMC Genet. 2005 Dec 30;6 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2156-6-S1-S2.

Multilocus linkage disequilibrium mapping by the decay of haplotype sharing with samples of related individuals.利用与相关个体样本的单倍型共享衰减进行多位点连锁不平衡作图。

Genet Epidemiol. 2005 Sep;29(2):128-40. doi: 10.1002/gepi.20081.

Case-control single-marker and haplotypic association analysis of pedigree data.系谱数据的病例对照单标记和单倍型关联分析。

Genet Epidemiol. 2005 Feb;28(2):110-22. doi: 10.1002/gepi.20051.

Evaluating associations of haplotypes with traits.评估单倍型与性状之间的关联。

Genet Epidemiol. 2004 Dec;27(4):348-64. doi: 10.1002/gepi.20037.

Best linear unbiased allele-frequency estimation in complex pedigrees.复杂家系中最佳线性无偏等位基因频率估计

Biometrics. 2004 Jun;60(2):359-67. doi: 10.1111/j.0006-341X.2004.00180.x.

Family-based tests for associating haplotypes with general phenotype data: application to asthma genetics.用于将单倍型与一般表型数据相关联的基于家系的检测：在哮喘遗传学中的应用。

Genet Epidemiol. 2004 Jan;26(1):61-9. doi: 10.1002/gepi.10295.

Novel case-control test in a founder population identifies P-selectin as an atopy-susceptibility locus.在一个奠基者群体中进行的新型病例对照试验确定P选择素是一个特应性易感基因座。

Am J Hum Genet. 2003 Sep;73(3):612-26. doi: 10.1086/378208. Epub 2003 Aug 15.

On the identification of disease mutations by the analysis of haplotype similarity and goodness of fit.通过单倍型相似性和拟合优度分析来鉴定疾病突变。

Am J Hum Genet. 2003 Apr;72(4):891-902. doi: 10.1086/373881. Epub 2003 Feb 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验