Kent W J, Haussler D
Department of Biology, University of California at Santa Cruz, Santa Cruz, California 95064, USA.
Genome Res. 2001 Sep;11(9):1541-8. doi: 10.1101/gr.183201.
The data for the public working draft of the human genome contains roughly 400,000 initial sequence contigs in approximately 30,000 large insert clones. Many of these initial sequence contigs overlap. A program, GigAssembler, was built to merge them and to order and orient the resulting larger sequence contigs based on mRNA, paired plasmid ends, EST, BAC end pairs, and other information. This program produced the first publicly available assembly of the human genome, a working draft containing roughly 2.7 billion base pairs and covering an estimated 88% of the genome that has been used for several recent studies of the genome. Here we describe the algorithm used by GigAssembler.
人类基因组公开工作草案的数据包含大约30000个大插入片段克隆中的约400000个初始序列重叠群。这些初始序列重叠群中有许多相互重叠。构建了一个名为GigAssembler的程序,用于合并这些重叠群,并根据mRNA、配对质粒末端、EST、BAC末端配对及其他信息对生成的更大序列重叠群进行排序和定向。该程序产生了人类基因组的首个公开可用组装结果,即一个工作草案,包含约27亿个碱基对,覆盖了估计88%的基因组,该草案已用于近期多项基因组研究。在此,我们描述GigAssembler所使用的算法。