使用GigAssembler组装人类基因组工作草图。

Assembly of the working draft of the human genome with GigAssembler.

作者信息

Kent W J, Haussler D

机构信息

Department of Biology, University of California at Santa Cruz, Santa Cruz, California 95064, USA.

出版信息

Genome Res. 2001 Sep;11(9):1541-8. doi: 10.1101/gr.183201.

DOI:10.1101/gr.183201

PMID:11544197

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC311095/

Abstract

The data for the public working draft of the human genome contains roughly 400,000 initial sequence contigs in approximately 30,000 large insert clones. Many of these initial sequence contigs overlap. A program, GigAssembler, was built to merge them and to order and orient the resulting larger sequence contigs based on mRNA, paired plasmid ends, EST, BAC end pairs, and other information. This program produced the first publicly available assembly of the human genome, a working draft containing roughly 2.7 billion base pairs and covering an estimated 88% of the genome that has been used for several recent studies of the genome. Here we describe the algorithm used by GigAssembler.

摘要

人类基因组公开工作草案的数据包含大约30000个大插入片段克隆中的约400000个初始序列重叠群。这些初始序列重叠群中有许多相互重叠。构建了一个名为GigAssembler的程序，用于合并这些重叠群，并根据mRNA、配对质粒末端、EST、BAC末端配对及其他信息对生成的更大序列重叠群进行排序和定向。该程序产生了人类基因组的首个公开可用组装结果，即一个工作草案，包含约27亿个碱基对，覆盖了估计88%的基因组，该草案已用于近期多项基因组研究。在此，我们描述GigAssembler所使用的算法。

相似文献

Assembly of the working draft of the human genome with GigAssembler.使用GigAssembler组装人类基因组工作草图。

Genome Res. 2001 Sep;11(9):1541-8. doi: 10.1101/gr.183201.

De novo repeat classification and fragment assembly.从头重复序列分类和片段组装。

Genome Res. 2004 Sep;14(9):1786-96. doi: 10.1101/gr.2395204.

Computational BAC clone contig assembly for comprehensive genome analysis.用于全面基因组分析的计算性BAC克隆重叠群组装

Genes Chromosomes Cancer. 2004 May;40(1):66-71. doi: 10.1002/gcc.20016.

Barnacle: an assembly algorithm for clone-based sequences of whole genomes.藤壶：一种用于全基因组基于克隆序列的组装算法。

Gene. 2003 Nov 27;320:165-76. doi: 10.1016/s0378-1119(03)00825-4.

CAR: contig assembly of prokaryotic draft genomes using rearrangements.CAR：利用重排对原核生物草图基因组进行重叠群组装。

BMC Bioinformatics. 2014 Nov 28;15(1):381. doi: 10.1186/s12859-014-0381-3.

Mouse BAC ends quality assessment and sequence analyses.小鼠细菌人工染色体末端质量评估及序列分析。

Genome Res. 2001 Oct;11(10):1736-45. doi: 10.1101/gr.179201.

A high-resolution radiation hybrid map of the human genome draft sequence.人类基因组草图序列的高分辨率辐射杂种图谱。

Science. 2001 Feb 16;291(5507):1298-302. doi: 10.1126/science.1057437.

Computational comparison of human genomic sequence assemblies for a region of chromosome 4.人类4号染色体一个区域的基因组序列组装的计算比较

Genome Res. 2002 Mar;12(3):424-9. doi: 10.1101/gr.207902.

A tool for analyzing mate pairs in assemblies (TAMPA).一种用于分析装配体中配对关系的工具（TAMPA）。

J Comput Biol. 2005 Jun;12(5):497-513. doi: 10.1089/cmb.2005.12.497.

Generation of physical map contig-specific sequences useful for whole genome sequence scaffolding.生成有助于全基因组序列支架构建的物理图谱连续序列特异性序列。

PLoS One. 2013 Oct 24;8(10):e78872. doi: 10.1371/journal.pone.0078872. eCollection 2013.

引用本文的文献

The Human Cell Atlas from a cell census to a unified foundation model.从细胞普查到统一基础模型的人类细胞图谱。

Nature. 2025 Jan;637(8048):1065-1071. doi: 10.1038/s41586-024-08338-4. Epub 2024 Nov 20.

Early macrophage response to obesity encompasses Interferon Regulatory Factor 5 regulated mitochondrial architecture remodelling.早期肥胖症中巨噬细胞的反应包括干扰素调节因子 5 调控的线粒体结构重塑。

Nat Commun. 2022 Aug 30;13(1):5089. doi: 10.1038/s41467-022-32813-z.

Blood-derived lncRNAs as biomarkers for cancer diagnosis: the Good, the Bad and the Beauty.血液来源的长链非编码RNA作为癌症诊断的生物标志物：优势、局限与美妙之处

NPJ Precis Oncol. 2022 Jun 21;6(1):40. doi: 10.1038/s41698-022-00283-7.

Genome-wide characterization of human minisatellite VNTRs: population-specific alleles and gene expression differences.人类小卫星 VNTR 全基因组特征分析：种群特异性等位基因和基因表达差异。

Nucleic Acids Res. 2021 May 7;49(8):4308-4324. doi: 10.1093/nar/gkab224.

Benchmarking of next and third generation sequencing technologies and their associated algorithms for genome assembly.对下一代和第三代测序技术及其相关算法进行基因组组装的基准测试。

Mol Med Rep. 2021 Apr;23(4). doi: 10.3892/mmr.2021.11890. Epub 2021 Feb 4.

Alternate-locus aware variant calling in whole genome sequencing.全基因组测序中位点交替感知变异检测

Genome Med. 2016 Dec 13;8(1):130. doi: 10.1186/s13073-016-0383-z.

Human Contamination in Public Genome Assemblies.公共基因组组装中的人类污染

PLoS One. 2016 Sep 9;11(9):e0162424. doi: 10.1371/journal.pone.0162424. eCollection 2016.

The life cycle of a genome project: perspectives and guidelines inspired by insect genome projects.基因组计划的生命周期：受昆虫基因组计划启发的观点与指南

F1000Res. 2016 Jan 5;5:18. doi: 10.12688/f1000research.7559.1. eCollection 2016.

Multiplex sequencing of bacterial artificial chromosomes for assembling complex plant genomes.用于组装复杂植物基因组的细菌人工染色体多重测序

Plant Biotechnol J. 2016 Jul;14(7):1511-22. doi: 10.1111/pbi.12511. Epub 2016 Jan 23.

A potential endophenotype for Alzheimer's disease: cerebrospinal fluid clusterin.阿尔茨海默病的一种潜在内表型：脑脊液中的簇集素。

Neurobiol Aging. 2016 Jan;37:208.e1-208.e9. doi: 10.1016/j.neurobiolaging.2015.09.009. Epub 2015 Sep 25.

本文引用的文献

Integration of cytogenetic landmarks into the draft sequence of the human genome.将细胞遗传学标记整合到人类基因组草图序列中。

Nature. 2001 Feb 15;409(6822):953-8. doi: 10.1038/35057192.

Comparison of human genetic and sequence-based physical maps.人类遗传图谱与基于序列的物理图谱的比较。

Nature. 2001 Feb 15;409(6822):951-3. doi: 10.1038/35057185.

Integration of telomere sequences with the draft human genome sequence.端粒序列与人类基因组草图序列的整合。

Nature. 2001 Feb 15;409(6822):948-51. doi: 10.1038/35057180.

The physical maps for sequencing human chromosomes 1, 6, 9, 10, 13, 20 and X.用于对人类1号、6号、9号、10号、13号、20号染色体及X染色体进行测序的物理图谱。

Nature. 2001 Feb 15;409(6822):942-3. doi: 10.1038/35057165.

A physical map of the human genome.人类基因组的物理图谱。

Nature. 2001 Feb 15;409(6822):934-41. doi: 10.1038/35057157.

A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms.一张包含142万个单核苷酸多态性的人类基因组序列变异图谱。

Nature. 2001 Feb 15;409(6822):928-33. doi: 10.1038/35057149.

Initial sequencing and analysis of the human genome.人类基因组的初步测序与分析。

Nature. 2001 Feb 15;409(6822):860-921. doi: 10.1038/35057062.

Cancer and genomics.癌症与基因组学

Nature. 2001 Feb 15;409(6822):850-2. doi: 10.1038/35057046.

Evolutionary analyses of the human genome.人类基因组的进化分析。

Nature. 2001 Feb 15;409(6822):847-9. doi: 10.1038/35057039.

Can sequencing shed light on cell cycling?测序能揭示细胞周期的奥秘吗？

Nature. 2001 Feb 15;409(6822):844-6. doi: 10.1038/35057033.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验