Zhao S, Shatsman S, Ayodeji B, Geer K, Tsegaye G, Krol M, Gebregeorgis E, Shvartsbeyn A, Russell D, Overton L, Jiang L, Dimitrov G, Tran K, Shetty J, Malek J A, Feldblyum T, Nierman W C, Fraser C M
The Institute for Genomic Research, Rockville, Maryland 20850, USA.
Genome Res. 2001 Oct;11(10):1736-45. doi: 10.1101/gr.179201.
A large-scale BAC end-sequencing project at The Institute for Genomic Research (TIGR) has generated one of the most extensive sets of sequence markers for the mouse genome to date. With a sequencing success rate of >80%, an average read length of 485 bp, and ABI3700 capillary sequencers, we have generated 449,234 nonredundant mouse BAC end sequences (mBESs) with 218 Mb total from 257,318 clones from libraries RPCI-23 and RPCI-24, representing 15x clone coverage, 7% sequence coverage, and a marker every 7 kb across the genome. A total of 191,916 BACs have sequences from both ends providing 12x genome coverage. The average Q20 length is 406 bp and 84% of the bases have phred quality scores > or = 20. RPCI-24 mBESs have more Q20 bases and longer reads on average than RPCI-23 sequences. ABI3700 sequencers and the sample tracking system ensure that > 95% of mBESs are associated with the right clone identifiers. We have found that a significant fraction of mBESs contains L1 repeats and approximately 48% of the clones have both ends with > or = 100 bp contiguous unique Q20 bases. About 3% mBESs match ESTs and > 70% of matches were conserved between the mouse and the human or the rat. Approximately 0.1% mBESs contain STSs. About 0.2% mBESs match human finished sequences and > 70% of these sequences have EST hits. The analyses indicate that our high-quality mouse BAC end sequences will be a valuable resource to the community.
美国基因组研究所(TIGR)开展的一项大规模细菌人工染色体(BAC)末端测序项目,已生成了迄今为止最为全面的小鼠基因组序列标记集之一。凭借超过80%的测序成功率、平均485碱基对的读长以及ABI3700毛细管测序仪,我们从RPCI - 23和RPCI - 24文库的257,318个克隆中生成了449,234条非冗余小鼠BAC末端序列(mBESs),总长度达218兆碱基,覆盖了15倍的克隆覆盖率、7%的序列覆盖率,且全基因组平均每7千碱基就有一个标记。共有191,916个BAC两端都有序列,提供了12倍的基因组覆盖率。平均Q20长度为406碱基对,84%的碱基的Phred质量得分大于或等于20。RPCI - 24的mBESs平均比RPCI - 23序列有更多Q20碱基和更长的读长。ABI3700测序仪和样本追踪系统确保超过95%的mBESs与正确的克隆标识符相关联。我们发现相当一部分mBESs包含L1重复序列,约48%的克隆两端都有大于或等于100碱基对的连续唯一Q20碱基。约3%的mBESs与表达序列标签(ESTs)匹配,且超过70%的匹配在小鼠与人或大鼠之间是保守的。约0.1%的mBESs包含序列标签位点(STSs)。约0.2%的mBESs与人类完成序列匹配,其中超过70%的这些序列有EST匹配。分析表明,我们高质量的小鼠BAC末端序列将成为该领域的宝贵资源。