一种基于似然性的缺失基因型数据处理方法。

A Likelihood-Based Approach for Missing Genotype Data.

作者信息

D'Angelo Gina M, Kamboh M Ilyas, Feingold Eleanor

机构信息

Division of Biostatistics, Washington University School of Medicine, St. Louis, Mo., USA.

出版信息

Hum Hered. 2010;69(3):171-83. doi: 10.1159/000273732.

DOI:10.1159/000273732

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7077088/

Abstract

Missing genotype data in a candidate gene association study can make it difficult to model the effects of multiple genetic variants simultaneously. In particular, when regression models are used to model phenotype as a function of SNP genotypes in several different genes, the most common approach is a complete case analysis, in which only individuals with no missing genotypes are included. But this can lead to substantial reduction in sample size and thus potential bias and loss in efficiency. A number of other methods for handling missing data are applicable, but have rarely been used in this context. The purpose of this paper is to describe how several standard methods for handling missing data can be applied or adapted to this problem, and to compare their performance using a simulation study. We demonstrate these techniques using an Alzheimer's disease association study. We show that the expectation-maximization algorithm and multiple imputation with a bootstrapped expectation-maximization sampling algorithm have the best properties of all the estimators studied.

摘要

在候选基因关联研究中，缺失的基因型数据可能会使同时对多个基因变异的效应进行建模变得困难。特别是，当使用回归模型将表型建模为几个不同基因中SNP基因型的函数时，最常见的方法是完全病例分析，即只纳入没有缺失基因型的个体。但这可能会导致样本量大幅减少，从而产生潜在的偏差和效率损失。还有许多其他处理缺失数据的方法也适用，但在这种情况下很少使用。本文的目的是描述几种处理缺失数据的标准方法如何应用或适用于这个问题，并通过模拟研究比较它们的性能。我们使用一项阿尔茨海默病关联研究来演示这些技术。我们表明，期望最大化算法以及带有自举期望最大化抽样算法的多重填补，在所研究的所有估计方法中具有最佳性能。

相似文献

1

A Likelihood-Based Approach for Missing Genotype Data.

Hum Hered. 2010;69(3):171-83. doi: 10.1159/000273732.

2

High-dimensional, outcome-dependent missing data problems: Models for the human loci.

Stat Methods Med Res. 2025 Mar;34(3):440-456. doi: 10.1177/09622802241304112. Epub 2025 Jan 31.

3

Estimating haplotype frequencies and standard errors for multiple single nucleotide polymorphisms.

Biostatistics. 2003 Oct;4(4):513-22. doi: 10.1093/biostatistics/4.4.513.

4

Multiple imputation of missing genotype data for unrelated individuals.

Ann Hum Genet. 2006 May;70(Pt 3):372-81. doi: 10.1111/j.1529-8817.2005.00236.x.

5

Phenotypically Enriched Genotypic Imputation in Genetic Association Tests.

Hum Hered. 2016;81(1):35-45. doi: 10.1159/000446986. Epub 2016 Aug 31.

6

Simple and efficient analysis of disease association with missing genotype data.

Am J Hum Genet. 2008 Feb;82(2):444-52. doi: 10.1016/j.ajhg.2007.11.004.

7

Imputation methods to improve inference in SNP association studies.

Genet Epidemiol. 2006 Dec;30(8):690-702. doi: 10.1002/gepi.20180.

8

SNiPer-HD: improved genotype calling accuracy by an expectation-maximization algorithm for high-density SNP arrays.

Bioinformatics. 2007 Jan 1;23(1):57-63. doi: 10.1093/bioinformatics/btl536. Epub 2006 Oct 24.

9

Missing genotype imputation in non-model species using self-organizing maps.

Mol Ecol Resour. 2025 Apr;25(3):e13992. doi: 10.1111/1755-0998.13992. Epub 2024 Jul 6.

10

Simpute: an efficient solution for dense genotypic data.

Biomed Res Int. 2013;2013:813912. doi: 10.1155/2013/813912. Epub 2013 Feb 3.

引用本文的文献

1

Individual-based landscape genomics for conservation: An analysis pipeline.

Mol Ecol Resour. 2023 Oct 26. doi: 10.1111/1755-0998.13884.

2

Missing Data Methods for Partial Correlations.

J Biom Biostat. 2012 Dec;3(8). doi: 10.4172/2155-6180.1000155.

本文引用的文献

1

Full Maximum Likelihood Estimation of Polychoric and Polyserial Correlations With Missing Data.

Multivariate Behav Res. 2003 Jan 1;38(1):57-79. doi: 10.1207/S15327906MBR3801_3.

2

The influence of 5-HTTLPR and STin2 polymorphisms in the serotonin transporter gene on treatment effect of selective serotonin reuptake inhibitors in depressive patients.

Psychiatr Genet. 2008 Aug;18(4):184-90. doi: 10.1097/YPG.0b013e3283050aca.

3

Identification of genetic polymorphisms associated with risk for pulmonary hypertension in sickle cell disease.

Blood. 2008 Jun 15;111(12):5721-6. doi: 10.1182/blood-2007-02-074849. Epub 2008 Jan 10.

4

TPH2 and TPH1: association of variants and interactions with heroin addiction.

Behav Genet. 2008 Mar;38(2):133-50. doi: 10.1007/s10519-007-9187-7. Epub 2008 Jan 8.

5

Multiple imputation: current perspectives.

Stat Methods Med Res. 2007 Jun;16(3):199-218. doi: 10.1177/0962280206075304.

6

A new multipoint method for genome-wide association studies by imputation of genotypes.

Nat Genet. 2007 Jul;39(7):906-13. doi: 10.1038/ng2088. Epub 2007 Jun 17.

7

Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models.

Am Stat. 2007 Feb;61(1):79-90. doi: 10.1198/000313007X172556.

8

Imputation methods to improve inference in SNP association studies.

Genet Epidemiol. 2006 Dec;30(8):690-702. doi: 10.1002/gepi.20180.

9

Testing untyped alleles (TUNA)-applications to genome-wide association studies.

Genet Epidemiol. 2006 Dec;30(8):718-27. doi: 10.1002/gepi.20182.

10

Covariates missing by design: comparison of the efficient score to other weighted methods.

Stat Med. 2007 May 10;26(10):2137-53. doi: 10.1002/sim.2686.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。