三联体与无关个体的定相算法比较。

A comparison of phasing algorithms for trios and unrelated individuals.

作者信息

Marchini Jonathan, Cutler David, Patterson Nick, Stephens Matthew, Eskin Eleazar, Halperin Eran, Lin Shin, Qin Zhaohui S, Munro Heather M, Abecasis Goncalo R, Donnelly Peter

机构信息

Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdom.

出版信息

Am J Hum Genet. 2006 Mar;78(3):437-50. doi: 10.1086/500808. Epub 2006 Jan 26.

DOI:10.1086/500808

PMID:16465620

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1380287/

Abstract

Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of statistical and computational methods that infer haplotype phase from genotype data. Although a substantial number of such methods have been developed, they have focused principally on inference from unrelated individuals, and comparisons between methods have been rather limited. Here, we describe the extension of five leading algorithms for phase inference for handling father-mother-child trios. We performed a comprehensive assessment of the methods applied to both trios and to unrelated individuals, with a focus on genomic-scale problems, using both simulated data and data from the HapMap project. The most accurate algorithm was PHASE (v2.1). For this method, the percentages of genotypes whose phase was incorrectly inferred were 0.12%, 0.05%, and 0.16% for trios from simulated data, HapMap Centre d'Etude du Polymorphisme Humain (CEPH) trios, and HapMap Yoruban trios, respectively, and 5.2% and 5.9% for unrelated individuals in simulated data and the HapMap CEPH data, respectively. The other methods considered in this work had comparable but slightly worse error rates. The error rates for trios are similar to the levels of genotyping error and missing data expected. We thus conclude that all the methods considered will provide highly accurate estimates of haplotypes when applied to trio data sets. Running times differ substantially between methods. Although it is one of the slowest methods, PHASE (v2.1) was used to infer haplotypes for the 1 million-SNP HapMap data set. Finally, we evaluated methods of estimating the value of r(2) between a pair of SNPs and concluded that all methods estimated r(2) well when the estimated value was >or=0.8.

摘要

单倍型相位信息对于疾病、群体和进化遗传学研究中的许多分析方法都很有价值。人们投入了大量的研究精力来开发从基因型数据推断单倍型相位的统计和计算方法。尽管已经开发了大量此类方法，但它们主要集中于从不相关个体进行推断，并且方法之间的比较相当有限。在此，我们描述了用于处理父母 - 子女三联体的五种主要相位推断算法的扩展。我们对应用于三联体和不相关个体的方法进行了全面评估，重点关注基因组规模的问题，使用了模拟数据和来自HapMap项目的数据。最准确的算法是PHASE（v2.1）。对于该方法，来自模拟数据的三联体、HapMap人类多态性研究中心（CEPH）三联体和HapMap约鲁巴三联体中，相位被错误推断的基因型百分比分别为0.12%、0.05%和0.16%，而在模拟数据和HapMap CEPH数据中的不相关个体分别为5.2%和5.9%。本研究中考虑的其他方法具有可比但略高的错误率。三联体的错误率与预期的基因分型错误和缺失数据水平相似。因此，我们得出结论，当应用于三联体数据集时，所有考虑的方法都将提供高度准确的单倍型估计。方法之间的运行时间差异很大。尽管PHASE（v2.1）是最慢的方法之一，但它被用于推断100万个单核苷酸多态性（SNP）的HapMap数据集的单倍型。最后，我们评估了估计一对SNP之间r(2)值的方法，并得出结论，当估计值≥0.8时，所有方法对r(2)的估计都很好。

相似文献

A comparison of phasing algorithms for trios and unrelated individuals.三联体与无关个体的定相算法比较。

Am J Hum Genet. 2006 Mar;78(3):437-50. doi: 10.1086/500808. Epub 2006 Jan 26.

A haplotype inference algorithm for trios based on deterministic sampling.基于确定性采样的三体型单倍型推断算法。

BMC Genet. 2010 Aug 23;11:78. doi: 10.1186/1471-2156-11-78.

Characterisation of SNP haplotype structure in chemokine and chemokine receptor genes using CEPH pedigrees and statistical estimation.利用CEPH家系和统计估计对趋化因子和趋化因子受体基因中的单核苷酸多态性单倍型结构进行表征。

Hum Genomics. 2004 Mar;1(3):195-207. doi: 10.1186/1479-7364-1-3-195.

2SNP: scalable phasing method for trios and unrelated individuals.2SNP：适用于三联体和无关个体的可扩展定相方法。

IEEE/ACM Trans Comput Biol Bioinform. 2008 Apr-Jun;5(2):313-8. doi: 10.1109/TCBB.2007.1068.

Using DNA pools for genotyping trios.使用DNA池对三联体进行基因分型。

Nucleic Acids Res. 2006;34(19):e129. doi: 10.1093/nar/gkl700. Epub 2006 Oct 4.

A comparison of different algorithms for phasing haplotypes using Holstein cattle genotypes and pedigree data.使用荷斯坦奶牛基因型和系谱数据对不同单倍型定相算法的比较。

J Dairy Sci. 2017 Apr;100(4):2837-2849. doi: 10.3168/jds.2016-11590. Epub 2017 Feb 1.

Haplotype phasing and inheritance of copy number variants in nuclear families.核心家庭中单体型定相及拷贝数变异的遗传

PLoS One. 2015 Apr 8;10(4):e0122713. doi: 10.1371/journal.pone.0122713. eCollection 2015.

A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals.针对三联体和无关个体的大型数据集进行基因型填充和单倍型相位推断的统一方法。

Am J Hum Genet. 2009 Feb;84(2):210-23. doi: 10.1016/j.ajhg.2009.01.005. Epub 2009 Feb 5.

A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase.一种用于大规模群体基因型数据的快速灵活统计模型：在推断缺失基因型和单倍型相位中的应用。

Am J Hum Genet. 2006 Apr;78(4):629-44. doi: 10.1086/502802. Epub 2006 Feb 17.

Genotype error biases trio-based estimates of haplotype phase accuracy.基于家系的单体型相位准确性估计会受到基因型错误的偏倚。

Am J Hum Genet. 2022 Jun 2;109(6):1016-1025. doi: 10.1016/j.ajhg.2022.04.019.

引用本文的文献

A genotype imputation reference panel specific for native Southeast Asian populations.一个专门针对东南亚本土人群的基因型填充参考面板。

NPJ Genom Med. 2024 Oct 5;9(1):47. doi: 10.1038/s41525-024-00435-7.

Missing genotype imputation in non-model species using self-organizing maps.使用自组织映射对非模式物种进行缺失基因型填充

Mol Ecol Resour. 2025 Apr;25(3):e13992. doi: 10.1111/1755-0998.13992. Epub 2024 Jul 6.

KnockoffHybrid: A knockoff framework for hybrid analysis of trio and population designs in genome-wide association studies.仿冒混合分析框架：全基因组关联研究中 trio 设计和群体设计的混合分析框架。

Am J Hum Genet. 2024 Jul 11;111(7):1448-1461. doi: 10.1016/j.ajhg.2024.05.003. Epub 2024 May 30.

Direct Comparative Analysis of a Pharmacogenomics Panel with PacBio Hifi Long-Read and Illumina Short-Read Sequencing.药物基因组学检测板与PacBio Hifi长读长测序和Illumina短读长测序的直接比较分析

J Pers Med. 2023 Nov 27;13(12):1655. doi: 10.3390/jpm13121655.

ACCURATE CONSTRUCTION OF LONG RANGE HAPLOTYPE IN UNRELATED INDIVIDUALS.无关个体中长程单倍型的精确构建。

Stat Sin. 2013;23:1441-1461. doi: 10.5705/ss.2012.141s.

Imputation of ancient human genomes.古代人类基因组的推断。

Nat Commun. 2023 Jun 20;14(1):3660. doi: 10.1038/s41467-023-39202-0.

Haplotyping-Assisted Diploid Assembly and Variant Detection with Linked Reads.基于连锁reads 的单体型辅助二倍体组装和变异检测。

Methods Mol Biol. 2023;2590:161-182. doi: 10.1007/978-1-0716-2819-5_11.

KnockoffTrio: A knockoff framework for the identification of putative causal variants in genome-wide association studies with trio design.仿冒 Trio：一种用于鉴定全基因组关联研究中 trio 设计中假定因果变体的仿冒框架。

Am J Hum Genet. 2022 Oct 6;109(10):1761-1776. doi: 10.1016/j.ajhg.2022.08.013. Epub 2022 Sep 22.

North Asian population relationships in a global context.全球背景下的北亚人群关系。

Sci Rep. 2022 May 4;12(1):7214. doi: 10.1038/s41598-022-10706-x.

Benchmarking phasing software with a whole-genome sequenced cattle pedigree.利用全基因组测序的牛系谱对相位软件进行基准测试。

BMC Genomics. 2022 Feb 15;23(1):130. doi: 10.1186/s12864-022-08354-6.

本文引用的文献

IMPORTANCE SAMPLING AND THE TWO-LOCUS MODEL WITH SUBDIVIDED POPULATION STRUCTURE.重要性抽样与具有细分种群结构的双基因座模型

Adv Appl Probab. 2008 Jun 1;40(2):473-500. doi: 10.1239/aap/1214950213.

Genetic Data Analysis. Methods for Discrete Population Genetic Data. Bruce S. Weir. Sinauer, Sunderland, MA, 1990. xiv, 377 pp., illus. $48; paper, $27.《遗传数据分析：离散群体遗传数据的方法》。布鲁斯·S·韦尔著。辛诺韦尔出版社，马萨诸塞州桑德兰，1990年。xiv页，共377页，有插图。定价48美元；平装本27美元。

Science. 1990 Oct 26;250(4980):575. doi: 10.1126/science.250.4980.575.

A haplotype map of the human genome.人类基因组单倍型图谱。

Nature. 2005 Oct 27;437(7063):1299-320. doi: 10.1038/nature04226.

Handling marker-marker linkage disequilibrium: pedigree analysis with clustered markers.处理标记-标记连锁不平衡：使用聚类标记的系谱分析

Am J Hum Genet. 2005 Nov;77(5):754-67. doi: 10.1086/497345. Epub 2005 Sep 20.

Calibrating a coalescent simulation of human genome sequence variation.校准人类基因组序列变异的合并模拟。

Genome Res. 2005 Nov;15(11):1576-83. doi: 10.1101/gr.3709305.

A fine-scale map of recombination rates and hotspots across the human genome.一幅涵盖人类基因组重组率和热点的精细图谱。

Science. 2005 Oct 14;310(5746):321-4. doi: 10.1126/science.1117196.

A comprehensive literature review of haplotyping software and methods for use with unrelated individuals.关于用于无关个体的单倍型分型软件和方法的综合文献综述。

Hum Genomics. 2005 Mar;2(1):39-66. doi: 10.1186/1479-7364-2-1-39.

Whole-genome patterns of common DNA variation in three human populations.三个人类群体中常见DNA变异的全基因组模式。

Science. 2005 Feb 18;307(5712):1072-9. doi: 10.1126/science.1105436.

Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation.在单倍型推断和缺失数据插补中考虑连锁不平衡的衰减。

Am J Hum Genet. 2005 Mar;76(3):449-62. doi: 10.1086/428594. Epub 2005 Jan 31.

Accounting for haplotype uncertainty in matched association studies: a comparison of simple and flexible techniques.匹配关联研究中考虑单倍型不确定性：简单与灵活技术的比较。

Genet Epidemiol. 2005 Apr;28(3):261-72. doi: 10.1002/gepi.20061.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验