Zhao S, Malek J, Mahairas G, Fu L, Nierman W, Venter J C, Adams M D
The Institute for Genomic Research, Rockville, Maryland 20850, USA.
Genomics. 2000 Feb 1;63(3):321-32. doi: 10.1006/geno.1999.6082.
End sequences from bacterial artificial chromosomes (BACs) provide highly specific sequence markers in large-scale sequencing projects. To date, we have generated >300,000 end sequences from >186,000 human BAC clones with an average read length of >460 bp for a total of 141 Mb covering approximately 4.7% of the genome. Over 60% of the clones have BAC end sequences (BESs) from both ends representing more than fivefold coverage of the human genome by the paired-end clones. Our quality assessments and sequence analyses indicate that BESs from human BAC libraries developed at The California Institute of Technology (CalTech) and Roswell Park Cancer Institute have similar properties. The analyses have highlighted differences in insert size for different segments of the CalTech library. Problems with the fidelity of tracking of sequence data back to physical clones have been observed in some subsets of the overall BES dataset. The annotation results of BESs for the contents of available genomic sequences, sequence tagged sites, expressed sequence tags, protein encoding regions, and repeats indicate that this resource will be valuable in many areas of genome research.
细菌人工染色体(BAC)的末端序列在大规模测序项目中提供了高度特异性的序列标记。迄今为止,我们已从超过186,000个人类BAC克隆中生成了超过300,000个末端序列,平均读长超过460 bp,总计141 Mb,覆盖了约4.7%的基因组。超过60%的克隆两端都有BAC末端序列(BES),代表双末端克隆对人类基因组的覆盖超过五倍。我们的质量评估和序列分析表明,加利福尼亚理工学院(CalTech)和罗斯威尔公园癌症研究所开发的人类BAC文库的BES具有相似的特性。分析突出了CalTech文库不同片段插入大小的差异。在整个BES数据集的一些子集中,观察到了将序列数据追溯到物理克隆的保真度问题。对可用基因组序列、序列标签位点、表达序列标签、蛋白质编码区域和重复序列的BES注释结果表明,该资源在基因组研究的许多领域将具有重要价值。