一种多位点单倍型的期望最大化（E-M）算法及检验策略。

An E-M algorithm and testing strategy for multiple-locus haplotypes.

作者信息

Long J C, Williams R C, Urbanek M

机构信息

Laboratory of Neurogenetics, NIAAA/NIH, Rockville, MD 20852.

出版信息

Am J Hum Genet. 1995 Mar;56(3):799-810.

PMID:7887436

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1801177/

Abstract

This paper gives an expectation maximization (EM) algorithm to obtain allele frequencies, haplotype frequencies, and gametic disequilibrium coefficients for multiple-locus systems. It permits high polymorphism and null alleles at all loci. This approach effectively deals with the primary estimation problems associated with such systems; that is, there is not a one-to-one correspondence between phenotypic and genotypic categories, and sample sizes tend to be much smaller than the number of phenotypic categories. The EM method provides maximum-likelihood estimates and therefore allows hypothesis tests using likelihood ratio statistics that have chi 2 distributions with large sample sizes. We also suggest a data resampling approach to estimate test statistic sampling distributions. The resampling approach is more computer intensive, but it is applicable to all sample sizes. A strategy to test hypotheses about aggregate groups of gametic disequilibrium coefficients is recommended. This strategy minimizes the number of necessary hypothesis tests while at the same time describing the structure of disequilibrium. These methods are applied to three unlinked dinucleotide repeat loci in Navajo Indians and to three linked HLA loci in Gila River (Pima) Indians. The likelihood functions of both data sets are shown to be maximized by the EM estimates, and the testing strategy provides a useful description of the structure of gametic disequilibrium. Following these applications, a number of simulation experiments are performed to test how well the likelihood-ratio statistic distributions are approximated by chi 2 distributions. In most circumstances the chi 2 grossly underestimated the probability of type I errors. However, at times they also overestimated the type 1 error probability. Accordingly, we recommended hypothesis tests that use the resampling method.

摘要

本文给出了一种期望最大化（EM）算法，用于获取多位点系统的等位基因频率、单倍型频率和配子不平衡系数。它允许所有位点存在高度多态性和无效等位基因。这种方法有效地处理了与此类系统相关的主要估计问题；也就是说，表型类别和基因型类别之间不存在一一对应关系，并且样本量往往远小于表型类别的数量。EM方法提供了最大似然估计，因此允许使用具有大样本量时呈卡方分布的似然比统计量进行假设检验。我们还建议采用数据重采样方法来估计检验统计量的抽样分布。重采样方法计算量更大，但适用于所有样本量。推荐了一种检验关于配子不平衡系数聚合组假设的策略。该策略在描述不平衡结构的同时，将必要的假设检验数量减至最少。这些方法应用于纳瓦霍印第安人的三个不连锁二核苷酸重复位点以及吉拉河（皮马）印第安人的三个连锁HLA位点。两个数据集的似然函数经EM估计均达到最大值，并且检验策略对配子不平衡结构进行了有效描述。在这些应用之后，进行了一些模拟实验，以检验卡方分布对似然比统计量分布的近似程度。在大多数情况下，卡方分布严重低估了I型错误的概率。然而，有时它们也高估了I型错误的概率。因此，我们推荐使用重采样方法进行假设检验。

相似文献

An E-M algorithm and testing strategy for multiple-locus haplotypes.一种多位点单倍型的期望最大化（E-M）算法及检验策略。

Am J Hum Genet. 1995 Mar;56(3):799-810.

Testing for linkage disequilibrium in genotypic data using the Expectation-Maximization algorithm.使用期望最大化算法对基因型数据进行连锁不平衡检验。

Heredity (Edinb). 1996 Apr;76 ( Pt 4):377-83. doi: 10.1038/hdy.1996.55.

[The use of the expectation-maximization (EM) algorithm for maximum likelihood estimation of gametic frequencies of multilocus polymorphic codominant systems based on sampled population data].[基于抽样群体数据，使用期望最大化（EM）算法对多位点共显性系统的配子频率进行最大似然估计]

Genetika. 2002 Mar;38(3):407-18.

Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data.通过针对未分型二倍体基因型数据的期望最大化算法，对等位基因位点单倍型频率估计的准确性。

Am J Hum Genet. 2000 Oct;67(4):947-59. doi: 10.1086/303069. Epub 2000 Aug 22.

Estimation of linkage disequilibrium for loci with multiple alleles: basic approach and an application using data from bighorn sheep.多等位基因位点的连锁不平衡估计：基本方法及使用大角羊数据的应用

Heredity (Edinb). 2001 Dec;87(Pt 6):698-708. doi: 10.1046/j.1365-2540.2001.00966.x.

The loss of statistical power to distinguish populations when certain samples are ambiguous.当某些样本不明确时，区分总体的统计检验力丧失。

Theor Popul Biol. 2003 Sep;64(2):177-92. doi: 10.1016/s0040-5809(03)00084-4.

Haplotype frequency estimation in patient populations: the effect of departures from Hardy-Weinberg proportions and collapsing over a locus in the HLA region.患者群体中的单倍型频率估计：偏离哈迪-温伯格比例以及HLA区域中一个基因座上的合并的影响。

Genet Epidemiol. 2002 Feb;22(2):186-95. doi: 10.1002/gepi.0163.

Pedigree disequilibrium tests for multilocus haplotypes.多位点单倍型的系谱不平衡检验

Genet Epidemiol. 2003 Sep;25(2):115-21. doi: 10.1002/gepi.10252.

HAPLORE: a program for haplotype reconstruction in general pedigrees without recombination.HAPLORE：一个用于在无重组的一般家系中进行单倍型重建的程序。

Bioinformatics. 2005 Jan 1;21(1):90-103. doi: 10.1093/bioinformatics/bth388. Epub 2004 Jul 1.

Simultaneous estimation of haplotype frequencies and quantitative trait parameters: applications to the test of association between phenotype and diplotype configuration.单倍型频率和数量性状参数的同时估计：在表型与双倍型构型关联检验中的应用。

Genetics. 2004 Sep;168(1):525-39. doi: 10.1534/genetics.104.029751.

引用本文的文献

Epistatic interaction between ERAP2 and HLA modulates HIV-1 adaptation and disease outcome in an Australian population.澳大利亚人群中 ERAP2 和 HLA 之间的上位性相互作用调节 HIV-1 的适应性和疾病结局。

PLoS Pathog. 2024 Jul 9;20(7):e1012359. doi: 10.1371/journal.ppat.1012359. eCollection 2024 Jul.

Epistasis Between HLA-DRB1*16:02:01 and SLC16A11 T-C-G-T-T Reduces Odds for Type 2 Diabetes in Southwest American Indians.HLA-DRB1*16:02:01 与 SLC16A11 T-C-G-T-T 之间的上位性降低了西南美洲印第安人 2 型糖尿病的发病风险。

Diabetes. 2024 Jun 1;73(6):1002-1011. doi: 10.2337/db23-0925.

ACCURATE CONSTRUCTION OF LONG RANGE HAPLOTYPE IN UNRELATED INDIVIDUALS.无关个体中长程单倍型的精确构建。

Stat Sin. 2013;23:1441-1461. doi: 10.5705/ss.2012.141s.

Evaluation of the influence of genetic variants in Cereblon gene on the response to the treatment of erythema nodosum leprosum with thalidomide.评估 Cereblon 基因遗传变异对沙利度胺治疗结节性红斑反应的影响。

Mem Inst Oswaldo Cruz. 2022 Nov 11;117:e220039. doi: 10.1590/0074-02760220039. eCollection 2022.

Protective association of HLA-DPB1*04:01:01 with acute encephalopathy with biphasic seizures and late reduced diffusion identified by HLA imputation.HLA 推断发现 HLA-DPB1*04:01:01 与具有双相发作和后期弥散受限的急性脑病存在保护关联。

Genes Immun. 2022 Jun;23(3-4):123-128. doi: 10.1038/s41435-022-00170-y. Epub 2022 Apr 14.

Next generation sequencing for HLA loci in full heritage Pima Indians of Arizona, Part II: HLA-A, -B, and -C with selected non-classical loci at 4-field resolution from whole genome sequences.亚利桑那州全血统皮马印第安人的 HLA 基因座的新一代测序，第二部分：HLA-A、-B 和 -C 以及全基因组序列中 4 个字段分辨率下的选定非经典基因座。

Hum Immunol. 2021 Jun;82(6):385-403. doi: 10.1016/j.humimm.2021.03.013. Epub 2021 Apr 17.

Statistical Method Based on Bayes-Type Empirical Score Test for Assessing Genetic Association with Multilocus Genotype Data.基于贝叶斯型经验得分检验的统计方法用于评估多位点基因型数据的基因关联性。

Int J Genomics. 2020 May 6;2020:4708152. doi: 10.1155/2020/4708152. eCollection 2020.

Analysis of Hematological Traits in Polled Yak by Genome-Wide Association Studies Using Individual SNPs and Haplotypes.利用个体 SNP 和单倍型进行全基因组关联研究分析无角牦牛血液特征。

Genes (Basel). 2019 Jun 17;10(6):463. doi: 10.3390/genes10060463.

Extent of third-order linkage disequilibrium in a composite line of Iberian pigs.伊比利亚猪复合品系中三阶连锁不平衡的程度。

BMC Genet. 2018 Aug 17;19(1):60. doi: 10.1186/s12863-018-0661-4.

A rare CHD5 haplotype and its interactions with environmental factors predicting hepatocellular carcinoma risk.一种罕见的 CHD5 单倍型及其与环境因素的相互作用预测肝细胞癌风险。

BMC Cancer. 2018 Jun 15;18(1):658. doi: 10.1186/s12885-018-4551-y.

本文引用的文献

Multilocus Structure of Natural Populations of HORDEUM SPONTANEUM.野生二棱大麦自然群体的多位点结构。

Genetics. 1980 Oct;96(2):523-36. doi: 10.1093/genetics/96.2.523.

Testing Hypotheses about Linkage Disequilibrium with Multiple Alleles.检验多等位基因连锁不平衡假设。

Genetics. 1978 Mar;88(3):633-42. doi: 10.1093/genetics/88.3.633.

Counting methods in genetical statistics.遗传统计学中的计数方法。

Ann Hum Genet. 1957 Mar;21(3):254-76. doi: 10.1111/j.1469-1809.1972.tb00287.x.

The estimation of gene frequencies in a random-mating population.随机交配群体中基因频率的估计。

Ann Hum Genet. 1955 Oct;20(2):97-115. doi: 10.1111/j.1469-1809.1955.tb01360.x.

On the theory of random mating.论随机交配理论。

Ann Eugen. 1954 Mar;18(4):311-7. doi: 10.1111/j.1469-1809.1952.tb02522.x.

Identification of the remains of the Romanov family by DNA analysis.通过DNA分析鉴定罗曼诺夫家族的遗骸。

Nat Genet. 1994 Feb;6(2):130-5. doi: 10.1038/ng0294-130.

Linkage disequilibrium in the neurofibromatosis 1 (NF1) region: implications for gene mapping.神经纤维瘤病1型（NF1）区域的连锁不平衡：对基因定位的影响。

Am J Hum Genet. 1993 Nov;53(5):1038-50.

Nondetectability of restriction fragments and independence of DNA fragment sizes within and between loci in RFLP typing of DNA.DNA的限制性片段长度多态性（RFLP）分型中，限制性片段的不可检测性以及基因座内部和之间DNA片段大小的独立性。

Am J Hum Genet. 1994 Aug;55(2):391-401.

The 1993-94 Généthon human genetic linkage map.1993 - 1994年热那亚人类遗传连锁图谱。

Nat Genet. 1994 Jun;7(2 Spec No):246-339. doi: 10.1038/ng0694supp-246.

High resolution of human evolutionary trees with polymorphic microsatellites.利用多态微卫星实现人类进化树的高分辨率构建。

Nature. 1994 Mar 31;368(6470):455-7. doi: 10.1038/368455a0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验