Suppr超能文献

某个人类个体的二倍体基因组序列。

The diploid genome sequence of an individual human.

作者信息

Levy Samuel, Sutton Granger, Ng Pauline C, Feuk Lars, Halpern Aaron L, Walenz Brian P, Axelrod Nelson, Huang Jiaqi, Kirkness Ewen F, Denisov Gennady, Lin Yuan, MacDonald Jeffrey R, Pang Andy Wing Chun, Shago Mary, Stockwell Timothy B, Tsiamouri Alexia, Bafna Vineet, Bansal Vikas, Kravitz Saul A, Busam Dana A, Beeson Karen Y, McIntosh Tina C, Remington Karin A, Abril Josep F, Gill John, Borman Jon, Rogers Yu-Hui, Frazier Marvin E, Scherer Stephen W, Strausberg Robert L, Venter J Craig

机构信息

J. Craig Venter Institute, Rockville, Maryland, USA.

出版信息

PLoS Biol. 2007 Sep 4;5(10):e254. doi: 10.1371/journal.pbio.0050254.

Abstract

Presented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2-206 bp), 292,102 heterozygous insertion/deletion events (indels)(1-571 bp), 559,473 homozygous indels (1-82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

摘要

本文展示了一个个体人类的基因组序列。它由约3200万个随机DNA片段产生,通过桑格双脱氧技术进行测序,并组装成4528个支架,包含28.1亿个碱基(Mb)的连续序列,任何给定区域的覆盖度约为7.5倍。我们开发了一种改良版的Celera组装程序,以促进在这个个体二倍体基因组中识别和比较替代等位基因。将这个基因组与美国国立生物技术信息中心的人类参考组装进行比较,发现了超过410万个DNA变异,涵盖12.3 Mb。这些变异(其中1288319个是新发现的)包括3213401个单核苷酸多态性(SNP)、53823个片段替换(2 - 206 bp)、292102个杂合插入/缺失事件(indel)(1 - 571 bp)、559473个纯合indel(1 - 82711 bp)、90个倒位,以及大量的片段重复和拷贝数变异区域。非SNP DNA变异占供体中所有已识别事件的22%,但它们涉及所有变异碱基的74%。这表明非SNP基因改变在定义二倍体基因组结构中起着重要作用。此外,44%的基因存在一个或多个变异的杂合性。使用一种新颖的单倍型组装策略,我们能够跨越超过200 kb片段的1.5 Gb基因组序列,为基因组的二倍体性质提供了更高的精确性。这些数据描绘了一个二倍体人类基因组的确定性分子图谱,为未来的基因组比较提供了起点,并开启了个性化基因组信息的时代。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/55c0/2043021/3ac64afabffa/pbio.0050254.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验