Suppr超能文献

基于 fosmid 的 HapMap 三亲子个体全基因组单体型分析:单一个体单体型分析技术的评估。

Fosmid-based whole genome haplotyping of a HapMap trio child: evaluation of Single Individual Haplotyping techniques.

机构信息

Department of Vertebrate Genomics, Max Planck Institute for Molecular Genetics, Ihnestr. 63-73, D-14195 Berlin, Germany.

出版信息

Nucleic Acids Res. 2012 Mar;40(5):2041-53. doi: 10.1093/nar/gkr1042. Epub 2011 Nov 18.

Abstract

Determining the underlying haplotypes of individual human genomes is an essential, but currently difficult, step toward a complete understanding of genome function. Fosmid pool-based next-generation sequencing allows genome-wide generation of 40-kb haploid DNA segments, which can be phased into contiguous molecular haplotypes computationally by Single Individual Haplotyping (SIH). Many SIH algorithms have been proposed, but the accuracy of such methods has been difficult to assess due to the lack of real benchmark data. To address this problem, we generated whole genome fosmid sequence data from a HapMap trio child, NA12878, for which reliable haplotypes have already been produced. We assembled haplotypes using eight algorithms for SIH and carried out direct comparisons of their accuracy, completeness and efficiency. Our comparisons indicate that fosmid-based haplotyping can deliver highly accurate results even at low coverage and that our SIH algorithm, ReFHap, is able to efficiently produce high-quality haplotypes. We expanded the haplotypes for NA12878 by combining the current haplotypes with our fosmid-based haplotypes, producing near-to-complete new gold-standard haplotypes containing almost 98% of heterozygous SNPs. This improvement includes notable fractions of disease-related and GWA SNPs. Integrated with other molecular biological data sets, this phase information will advance the emerging field of diploid genomics.

摘要

确定个体人类基因组的潜在单倍型是全面了解基因组功能的必要但目前具有挑战性的步骤。基于fosmid 池的下一代测序允许在全基因组范围内生成 40-kb 的单倍体 DNA 片段,这些片段可以通过单个体单倍型测序(SIH)在计算上整合成连续的分子单倍型。已经提出了许多 SIH 算法,但是由于缺乏真实的基准数据,这些方法的准确性一直难以评估。为了解决这个问题,我们从 HapMap 三亲子 NA12878 生成了全基因组 fosmid 序列数据,已经为其产生了可靠的单倍型。我们使用八种 SIH 算法组装单倍型,并直接比较它们的准确性、完整性和效率。我们的比较表明,即使在低覆盖率下,基于 fosmid 的单倍型也可以提供高度准确的结果,并且我们的 SIH 算法 ReFHap 能够有效地生成高质量的单倍型。我们通过将当前的单倍型与我们基于 fosmid 的单倍型相结合,将 NA12878 的单倍型扩展,产生了几乎完整的新金标准单倍型,包含近 98%的杂合 SNP。这种改进包括与疾病相关和 GWA SNP 的显著部分。与其他分子生物学数据集集成后,该相位信息将推进二倍体基因组学这一新兴领域的发展。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8076/3299995/63388e68ca93/gkr1042f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验