Suppr超能文献

Shape-IT:用于单倍型推断的新型快速准确算法。

Shape-IT: new rapid and accurate algorithm for haplotype inference.

作者信息

Delaneau Olivier, Coulonges Cédric, Zagury Jean-François

机构信息

Chaire de Bioinformatique, Conservatoire National des Arts et Métiers, 292 rue Saint-Martin, 75003 Paris, France.

出版信息

BMC Bioinformatics. 2008 Dec 16;9:540. doi: 10.1186/1471-2105-9-540.

Abstract

BACKGROUND

We have developed a new computational algorithm, Shape-IT, to infer haplotypes under the genetic model of coalescence with recombination developed by Stephens et al in Phase v2.1. It runs much faster than Phase v2.1 while exhibiting the same accuracy. The major algorithmic improvements rely on the use of binary trees to represent the sets of candidate haplotypes for each individual. These binary tree representations: (1) speed up the computations of posterior probabilities of the haplotypes by avoiding the redundant operations made in Phase v2.1, and (2) overcome the exponential aspect of the haplotypes inference problem by the smart exploration of the most plausible pathways (ie. haplotypes) in the binary trees.

RESULTS

Our results show that Shape-IT is several orders of magnitude faster than Phase v2.1 while being as accurate. For instance, Shape-IT runs 50 times faster than Phase v2.1 to compute the haplotypes of 200 subjects on 6,000 segments of 50 SNPs extracted from a standard Illumina 300 K chip (13 days instead of 630 days). We also compared Shape-IT with other widely used software, Gerbil, PL-EM, Fastphase, 2SNP, and Ishape in various tests: Shape-IT and Phase v2.1 were the most accurate in all cases, followed by Ishape and Fastphase. As a matter of speed, Shape-IT was faster than Ishape and Fastphase for datasets smaller than 100 SNPs, but Fastphase became faster -but still less accurate- to infer haplotypes on larger SNP datasets.

CONCLUSION

Shape-IT deserves to be extensively used for regular haplotype inference but also in the context of the new high-throughput genotyping chips since it permits to fit the genetic model of Phase v2.1 on large datasets. This new algorithm based on tree representations could be used in other HMM-based haplotype inference software and may apply more largely to other fields using HMM.

摘要

背景

我们开发了一种新的计算算法Shape-IT,用于在斯蒂芬斯等人在v2.1版本中提出的带有重组的合并遗传模型下推断单倍型。它的运行速度比v2.1版本快得多,同时具有相同的准确性。主要的算法改进依赖于使用二叉树来表示每个个体的候选单倍型集合。这些二叉树表示:(1)通过避免v2.1版本中进行的冗余操作,加快了单倍型后验概率的计算;(2)通过在二叉树中巧妙地探索最合理的路径(即单倍型),克服了单倍型推断问题的指数特性。

结果

我们的结果表明,Shape-IT比v2.1版本快几个数量级,同时准确性相同。例如,在从标准Illumina 300K芯片提取的50个单核苷酸多态性(SNP)的6000个片段上计算200个受试者的单倍型时,Shape-IT的运行速度比v2.1版本快50倍(13天而不是630天)。我们还在各种测试中将Shape-IT与其他广泛使用的软件Gerbil、PL-EM、Fastphase、2SNP和Ishape进行了比较:Shape-IT和v2.1版本在所有情况下都是最准确的,其次是Ishape和Fastphase。在速度方面,对于小于100个SNP的数据集,Shape-IT比Ishape和Fastphase快,但在更大的SNP数据集上推断单倍型时,Fastphase变得更快——但准确性仍然较低。

结论

Shape-IT不仅在常规单倍型推断中值得广泛使用,而且在新的高通量基因分型芯片的背景下也值得使用,因为它能够在大型数据集上拟合v2.1版本的遗传模型。这种基于树表示的新算法可用于其他基于隐马尔可夫模型(HMM)的单倍型推断软件,并可能更广泛地应用于使用HMM的其他领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/98a6/2647951/e459f9a7c304/1471-2105-9-540-1.jpg

相似文献

1
Shape-IT: new rapid and accurate algorithm for haplotype inference.
BMC Bioinformatics. 2008 Dec 16;9:540. doi: 10.1186/1471-2105-9-540.
2
ISHAPE: new rapid and accurate software for haplotyping.
BMC Bioinformatics. 2007 Jun 15;8:205. doi: 10.1186/1471-2105-8-205.
3
2SNP: scalable phasing based on 2-SNP haplotypes.
Bioinformatics. 2006 Feb 1;22(3):371-3. doi: 10.1093/bioinformatics/bti785. Epub 2005 Nov 15.
4
Inference of missing SNPs and information quantity measurements for haplotype blocks.
Bioinformatics. 2005 May 1;21(9):2001-7. doi: 10.1093/bioinformatics/bti261. Epub 2005 Feb 4.
5
2SNP: scalable phasing method for trios and unrelated individuals.
IEEE/ACM Trans Comput Biol Bioinform. 2008 Apr-Jun;5(2):313-8. doi: 10.1109/TCBB.2007.1068.
7
8
Evaluation of two methods for computational HLA haplotypes inference using a real dataset.
BMC Bioinformatics. 2008 Jan 29;9:68. doi: 10.1186/1471-2105-9-68.
9
A haplotype inference algorithm for trios based on deterministic sampling.
BMC Genet. 2010 Aug 23;11:78. doi: 10.1186/1471-2156-11-78.
10
An improved preprocessing algorithm for haplotype inference by pure parsimony.
J Bioinform Comput Biol. 2014 Aug;12(4):1450020. doi: 10.1142/S0219720014500206. Epub 2014 Aug 1.

引用本文的文献

1
Knockoff-Based Fine Mapping of MS-Associated SNPs in Sardinian Trios.
Biochem Genet. 2025 Aug 30. doi: 10.1007/s10528-025-11238-5.
3
Longitudinal sequencing reveals polygenic and epistatic nature of genomic response to selection.
Proc Natl Acad Sci U S A. 2025 Jun 24;122(25):e2410452122. doi: 10.1073/pnas.2410452122. Epub 2025 Jun 18.
4
Donor genetics and storage conditions influence mitochondrial DNA and extracellular vesicle levels in RBC units.
JCI Insight. 2025 Jun 10;10(14). doi: 10.1172/jci.insight.187792. eCollection 2025 Jul 22.
6
Identifying individuals with rare disease variants by inferring shared ancestral haplotypes from SNP array data.
NAR Genom Bioinform. 2025 Apr 4;7(2):lqaf033. doi: 10.1093/nargab/lqaf033. eCollection 2025 Jun.
7
Integrative Computational Analysis of Common EXO5 Haplotypes: Impact on Protein Dynamics, Genome Stability, and Cancer Progression.
J Chem Inf Model. 2025 Apr 14;65(7):3640-3654. doi: 10.1021/acs.jcim.5c00067. Epub 2025 Mar 21.

本文引用的文献

2
Evaluation of two methods for computational HLA haplotypes inference using a real dataset.
BMC Bioinformatics. 2008 Jan 29;9:68. doi: 10.1186/1471-2105-9-68.
3
A second generation human haplotype map of over 3.1 million SNPs.
Nature. 2007 Oct 18;449(7164):851-61. doi: 10.1038/nature06258.
4
ISHAPE: new rapid and accurate software for haplotyping.
BMC Bioinformatics. 2007 Jun 15;8:205. doi: 10.1186/1471-2105-8-205.
5
A haplotype of the human CXCR1 gene protective against rapid disease progression in HIV-1+ patients.
Proc Natl Acad Sci U S A. 2007 Feb 27;104(9):3354-9. doi: 10.1073/pnas.0611670104. Epub 2007 Feb 21.
6
Exhaustive genotyping of the interleukin-1 family genes and associations with AIDS progression in a French cohort.
J Infect Dis. 2006 Dec 1;194(11):1492-504. doi: 10.1086/508545. Epub 2006 Oct 26.
9
A comparison of phasing algorithms for trios and unrelated individuals.
Am J Hum Genet. 2006 Mar;78(3):437-50. doi: 10.1086/500808. Epub 2006 Jan 26.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验