Suppr超能文献

利用大规模平行测序数据生成高质量的哺乳动物基因组草图组装。

High-quality draft assemblies of mammalian genomes from massively parallel sequence data.

机构信息

Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.

出版信息

Proc Natl Acad Sci U S A. 2011 Jan 25;108(4):1513-8. doi: 10.1073/pnas.1017351108. Epub 2010 Dec 27.

Abstract

Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.

摘要

大规模平行 DNA 测序技术通过以非常低的成本生成数十亿个相对较短(约 100 个碱基)的序列读取,正在彻底改变基因组学。虽然这种数据可以很容易地用于广泛的生物医学应用,但事实证明,很难使用这些数据来生成大型、富含重复序列的脊椎动物基因组的高质量从头基因组组装。迄今为止,从这些数据生成的基因组组装远远落后于使用较旧(但昂贵得多)的毛细管测序方法获得的那些。在这里,我们报告了一种用于基因组组装的算法 ALLPATHS-LG 的开发及其在 Illumina 平台上生成的人类和小鼠基因组的大规模平行 DNA 序列数据中的应用。生成的草图基因组组装具有良好的准确性、短程连续性、长程连通性和基因组覆盖率。特别是,碱基准确性很高(≥99.95%),支架大小(人类为 11.5Mb,小鼠为 7.2Mb)接近毛细管测序获得的大小。改进的测序技术和改进的计算方法的结合现在应该可以大大增加大型基因组的从头测序。ALLPATHS-LG 程序可在 http://www.broadinstitute.org/science/programs/genome-biology/crd 获得。

相似文献

引用本文的文献

8
Mapping-based genome size estimation.基于图谱的基因组大小估计
BMC Genomics. 2025 May 14;26(1):482. doi: 10.1186/s12864-025-11640-8.

本文引用的文献

1
Limitations of next-generation genome sequence assembly.下一代基因组序列组装的局限性。
Nat Methods. 2011 Jan;8(1):61-5. doi: 10.1038/nmeth.1527. Epub 2010 Nov 21.
6
The sequence and de novo assembly of the giant panda genome.大熊猫基因组的序列与从头组装。
Nature. 2010 Jan 21;463(7279):311-7. doi: 10.1038/nature08696. Epub 2009 Dec 13.
7
10,000 genomes to come.未来将有一万个基因组。
Nature. 2009 Nov 5;462(7269):21. doi: 10.1038/462021a.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验