Shape-IT：用于单倍型推断的新型快速准确算法。

Shape-IT: new rapid and accurate algorithm for haplotype inference.

作者信息

Delaneau Olivier, Coulonges Cédric, Zagury Jean-François

机构信息

Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, 292 rue Saint-Martin, 75003 Paris, France.

出版信息

BMC Bioinformatics. 2008 Dec 16;9:540. doi: 10.1186/1471-2105-9-540.

DOI:10.1186/1471-2105-9-540

PMID:19087329

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2647951/

Abstract

BACKGROUND

We have developed a new computational algorithm, Shape-IT, to infer haplotypes under the genetic model of coalescence with recombination developed by Stephens et al in Phase v2.1. It runs much faster than Phase v2.1 while exhibiting the same accuracy. The major algorithmic improvements rely on the use of binary trees to represent the sets of candidate haplotypes for each individual. These binary tree representations: (1) speed up the computations of posterior probabilities of the haplotypes by avoiding the redundant operations made in Phase v2.1, and (2) overcome the exponential aspect of the haplotypes inference problem by the smart exploration of the most plausible pathways (ie. haplotypes) in the binary trees.

RESULTS

Our results show that Shape-IT is several orders of magnitude faster than Phase v2.1 while being as accurate. For instance, Shape-IT runs 50 times faster than Phase v2.1 to compute the haplotypes of 200 subjects on 6,000 segments of 50 SNPs extracted from a standard Illumina 300 K chip (13 days instead of 630 days). We also compared Shape-IT with other widely used software, Gerbil, PL-EM, Fastphase, 2SNP, and Ishape in various tests: Shape-IT and Phase v2.1 were the most accurate in all cases, followed by Ishape and Fastphase. As a matter of speed, Shape-IT was faster than Ishape and Fastphase for datasets smaller than 100 SNPs, but Fastphase became faster -but still less accurate- to infer haplotypes on larger SNP datasets.

CONCLUSION

Shape-IT deserves to be extensively used for regular haplotype inference but also in the context of the new high-throughput genotyping chips since it permits to fit the genetic model of Phase v2.1 on large datasets. This new algorithm based on tree representations could be used in other HMM-based haplotype inference software and may apply more largely to other fields using HMM.

摘要

背景

我们开发了一种新的计算算法Shape-IT，用于在斯蒂芬斯等人在v2.1版本中提出的带有重组的合并遗传模型下推断单倍型。它的运行速度比v2.1版本快得多，同时具有相同的准确性。主要的算法改进依赖于使用二叉树来表示每个个体的候选单倍型集合。这些二叉树表示：（1）通过避免v2.1版本中进行的冗余操作，加快了单倍型后验概率的计算；（2）通过在二叉树中巧妙地探索最合理的路径（即单倍型），克服了单倍型推断问题的指数特性。

结果

我们的结果表明，Shape-IT比v2.1版本快几个数量级，同时准确性相同。例如，在从标准Illumina 300K芯片提取的50个单核苷酸多态性（SNP）的6000个片段上计算200个受试者的单倍型时，Shape-IT的运行速度比v2.1版本快50倍（13天而不是630天）。我们还在各种测试中将Shape-IT与其他广泛使用的软件Gerbil、PL-EM、Fastphase、2SNP和Ishape进行了比较：Shape-IT和v2.1版本在所有情况下都是最准确的，其次是Ishape和Fastphase。在速度方面，对于小于100个SNP的数据集，Shape-IT比Ishape和Fastphase快，但在更大的SNP数据集上推断单倍型时，Fastphase变得更快——但准确性仍然较低。

结论

Shape-IT不仅在常规单倍型推断中值得广泛使用，而且在新的高通量基因分型芯片的背景下也值得使用，因为它能够在大型数据集上拟合v2.1版本的遗传模型。这种基于树表示的新算法可用于其他基于隐马尔可夫模型（HMM）的单倍型推断软件，并可能更广泛地应用于使用HMM的其他领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a6/2647951/e459f9a7c304/1471-2105-9-540-1.jpg

相似文献

Shape-IT: new rapid and accurate algorithm for haplotype inference.Shape-IT：用于单倍型推断的新型快速准确算法。

BMC Bioinformatics. 2008 Dec 16;9:540. doi: 10.1186/1471-2105-9-540.

ISHAPE: new rapid and accurate software for haplotyping.ISHAPE：用于单倍型分型的新型快速准确软件。

BMC Bioinformatics. 2007 Jun 15;8:205. doi: 10.1186/1471-2105-8-205.

2SNP: scalable phasing based on 2-SNP haplotypes.2SNP：基于双单核苷酸多态性单倍型的可扩展定相分析

Bioinformatics. 2006 Feb 1;22(3):371-3. doi: 10.1093/bioinformatics/bti785. Epub 2005 Nov 15.

Inference of missing SNPs and information quantity measurements for haplotype blocks.单倍型块中缺失单核苷酸多态性的推断及信息量测量

Bioinformatics. 2005 May 1;21(9):2001-7. doi: 10.1093/bioinformatics/bti261. Epub 2005 Feb 4.

2SNP: scalable phasing method for trios and unrelated individuals.2SNP：适用于三联体和无关个体的可扩展定相方法。

IEEE/ACM Trans Comput Biol Bioinform. 2008 Apr-Jun;5(2):313-8. doi: 10.1109/TCBB.2007.1068.

Comparison of the accuracy of methods of computational haplotype inference using a large empirical dataset.使用大型实证数据集对计算单倍型推断方法的准确性进行比较。

BMC Genet. 2004 Aug 3;5:22. doi: 10.1186/1471-2156-5-22.

A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes.一种结合长程相位和长单倍型推断方法的 SNP 基因型相位推断。

Genet Sel Evol. 2011 Mar 10;43(1):12. doi: 10.1186/1297-9686-43-12.

Evaluation of two methods for computational HLA haplotypes inference using a real dataset.使用真实数据集评估两种计算HLA单倍型推断方法。

BMC Bioinformatics. 2008 Jan 29;9:68. doi: 10.1186/1471-2105-9-68.

A haplotype inference algorithm for trios based on deterministic sampling.基于确定性采样的三体型单倍型推断算法。

BMC Genet. 2010 Aug 23;11:78. doi: 10.1186/1471-2156-11-78.

An improved preprocessing algorithm for haplotype inference by pure parsimony.一种通过纯简约法进行单倍型推断的改进预处理算法。

J Bioinform Comput Biol. 2014 Aug;12(4):1450020. doi: 10.1142/S0219720014500206. Epub 2014 Aug 1.

引用本文的文献

Knockoff-Based Fine Mapping of MS-Associated SNPs in Sardinian Trios.基于替代法对撒丁岛三人组中与多发性硬化症相关的单核苷酸多态性进行精细定位。

Biochem Genet. 2025 Aug 30. doi: 10.1007/s10528-025-11238-5.

Altered branched chain ketoacids underlie shared metabolic phenotypes in type 1 diabetes and maple syrup urine disease.支链酮酸改变是1型糖尿病和枫糖尿症共同代谢表型的基础。

Commun Med (Lond). 2025 Jul 26;5(1):311. doi: 10.1038/s43856-025-01028-w.

Longitudinal sequencing reveals polygenic and epistatic nature of genomic response to selection.纵向测序揭示了基因组对选择反应的多基因和上位性本质。

Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2410452122. doi: 10.1073/pnas.2410452122. Epub 2025 Jun 18.

Donor genetics and storage conditions influence mitochondrial DNA and extracellular vesicle levels in RBC units.供体遗传学和储存条件会影响红细胞单位中的线粒体DNA和细胞外囊泡水平。

JCI Insight. 2025 Jun 10;10(14). doi: 10.1172/jci.insight.187792. eCollection 2025 Jul 22.

A structural haplotype in the 17q21.31 MAPT region is associated with increased risk for chronic traumatic encephalopathy endophenotypes.17q21.31微管相关蛋白tau（MAPT）区域的一种结构单倍型与慢性创伤性脑病内表型风险增加相关。

Cell Rep Med. 2025 May 20;6(5):102084. doi: 10.1016/j.xcrm.2025.102084. Epub 2025 Apr 15.

Identifying individuals with rare disease variants by inferring shared ancestral haplotypes from SNP array data.通过从SNP阵列数据推断共享祖先单倍型来识别携带罕见病变异的个体。

NAR Genom Bioinform. 2025 Apr 4;7(2):lqaf033. doi: 10.1093/nargab/lqaf033. eCollection 2025 Jun.

Integrative Computational Analysis of Common EXO5 Haplotypes: Impact on Protein Dynamics, Genome Stability, and Cancer Progression.常见EXO5单倍型的综合计算分析：对蛋白质动力学、基因组稳定性和癌症进展的影响

J Chem Inf Model. 2025 Apr 14;65(7):3640-3654. doi: 10.1021/acs.jcim.5c00067. Epub 2025 Mar 21.

Characterization of NAT, GST, and CYP2E1 Genetic Variation in Sub-Saharan African Populations: Implications for Treatment of Tuberculosis and Other Diseases.撒哈拉以南非洲人群中NAT、GST和CYP2E1基因变异的特征：对结核病和其他疾病治疗的影响。

Clin Pharmacol Ther. 2025 May;117(5):1338-1357. doi: 10.1002/cpt.3557. Epub 2025 Jan 20.

Red blood cell urate levels are linked to hemolysis in vitro and post-transfusion as a function of donor sex, population and genetic polymorphisms in SLC2A9 and ABCG2.红细胞尿酸水平与体外溶血及输血后溶血有关，其与供体性别、人群以及溶质载体家族2成员9（SLC2A9）和ATP结合盒转运体G2（ABCG2）基因多态性有关。

Transfusion. 2025 Mar;65(3):560-574. doi: 10.1111/trf.18140. Epub 2025 Jan 19.

Neuropathology-based approach reveals novel Alzheimer's Disease genes and highlights female-specific pathways and causal links to disrupted lipid metabolism: insights into a vicious cycle.基于神经病理学的方法揭示了新的阿尔茨海默病基因，突出了女性特有的途径以及与脂质代谢紊乱的因果联系：对恶性循环的见解。

Acta Neuropathol Commun. 2025 Jan 4;13(1):1. doi: 10.1186/s40478-024-01909-6.

本文引用的文献

Common interleukin-6 promoter variants associate with the more severe forms of distal interphalangeal osteoarthritis.常见的白细胞介素-6启动子变异与更严重形式的远端指间关节骨关节炎相关。

Arthritis Res Ther. 2008;10(1):R21. doi: 10.1186/ar2374. Epub 2008 Feb 8.

Evaluation of two methods for computational HLA haplotypes inference using a real dataset.使用真实数据集评估两种计算HLA单倍型推断方法。

BMC Bioinformatics. 2008 Jan 29;9:68. doi: 10.1186/1471-2105-9-68.

A second generation human haplotype map of over 3.1 million SNPs.一张包含超过310万个单核苷酸多态性的第二代人类单倍型图谱。

Nature. 2007 Oct 18;449(7164):851-61. doi: 10.1038/nature06258.

ISHAPE: new rapid and accurate software for haplotyping.ISHAPE：用于单倍型分型的新型快速准确软件。

BMC Bioinformatics. 2007 Jun 15;8:205. doi: 10.1186/1471-2105-8-205.

A haplotype of the human CXCR1 gene protective against rapid disease progression in HIV-1+ patients.一种可预防HIV-1阳性患者疾病快速进展的人类CXCR1基因单倍型。

Proc Natl Acad Sci U S A. 2007 Feb 27;104(9):3354-9. doi: 10.1073/pnas.0611670104. Epub 2007 Feb 21.

Exhaustive genotyping of the interleukin-1 family genes and associations with AIDS progression in a French cohort.白细胞介素-1家族基因的全面基因分型及其与法国队列中艾滋病进展的关联。

J Infect Dis. 2006 Dec 1;194(11):1492-504. doi: 10.1086/508545. Epub 2006 Oct 26.

A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase.一种用于大规模群体基因型数据的快速灵活统计模型：在推断缺失基因型和单倍型相位中的应用。

Am J Hum Genet. 2006 Apr;78(4):629-44. doi: 10.1086/502802. Epub 2006 Feb 17.

Associations of the IL2Ralpha, IL4Ralpha, IL10Ralpha, and IFN (gamma) R1 cytokine receptor genes with AIDS progression in a French AIDS cohort.法国艾滋病队列中IL2Rα、IL4Rα、IL10Rα和IFN（γ）R1细胞因子受体基因与艾滋病进展的关联。

Immunogenetics. 2006 Apr;58(2-3):89-98. doi: 10.1007/s00251-005-0072-3. Epub 2006 Feb 21.

A comparison of phasing algorithms for trios and unrelated individuals.三联体与无关个体的定相算法比较。

Am J Hum Genet. 2006 Mar;78(3):437-50. doi: 10.1086/500808. Epub 2006 Jan 26.

Haplotypic structure of the X chromosome in the COGA population sample and the quality of its reconstruction by extant software packages.COGA 人群样本中 X 染色体的单体型结构及其现有软件包重建的质量。

BMC Genet. 2005 Dec 30;6 Suppl 1(Suppl 1):S77. doi: 10.1186/1471-2156-6-S1-S77.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

Shape-IT：用于单倍型推断的新型快速准确算法。

Shape-IT: new rapid and accurate algorithm for haplotype inference.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献