Cameron R A, Mahairas G, Rast J P, Martinez P, Biondi T R, Swartzell S, Wallace J C, Poustka A J, Livingston B T, Wray G A, Ettensohn C A, Lehrach H, Britten R J, Davidson E H, Hood L
Stowers Institute for Medical Research, Kansas City, MO 64110, USA.
Proc Natl Acad Sci U S A. 2000 Aug 15;97(17):9514-8. doi: 10.1073/pnas.160261897.
Results of a first-stage Sea Urchin Genome Project are summarized here. The species chosen was Strongylocentrotus purpuratus, a research model of major importance in developmental and molecular biology. A virtual map of the genome was constructed by sequencing the ends of 76,020 bacterial artificial chromosome (BAC) recombinants (average length, 125 kb). The BAC-end sequence tag connectors (STCs) occur an average of 10 kb apart, and, together with restriction digest patterns recorded for the same BAC clones, they provide immediate access to contigs of several hundred kilobases surrounding any gene of interest. The STCs survey >5% of the genome and provide the estimate that this genome contains approximately 27,350 protein-coding genes. The frequency distribution and canonical sequences of all middle and highly repetitive sequence families in the genome were obtained from the STCs as well. The 500-kb Hox gene complex of this species is being sequenced in its entirety. In addition, arrayed cDNA libraries of >10(5) clones each were constructed from every major stage of embryogenesis, several individual cell types, and adult tissues and are available to the community. The accumulated STC data and an expanding expressed sequence tag database (at present including >12, 000 sequences) have been reported to GenBank and are accessible on public web sites.
本文总结了海胆基因组计划第一阶段的成果。所选用的物种是紫球海胆,它是发育生物学和分子生物学中极为重要的研究模型。通过对76,020个细菌人工染色体(BAC)重组体(平均长度为125 kb)的末端进行测序,构建了基因组的虚拟图谱。BAC末端序列标签接头(STC)平均相隔10 kb,并且与为相同BAC克隆记录的限制性酶切图谱一起,可让人直接获取围绕任何感兴趣基因的数百千碱基的重叠群。STC覆盖了超过5%的基因组,并据此估计该基因组包含约27,350个蛋白质编码基因。还从STC中获得了基因组中所有中等和高度重复序列家族的频率分布及典型序列。该物种500 kb的Hox基因复合体正在进行全序列测定。此外,还从胚胎发育的各个主要阶段、几种单个细胞类型以及成体组织构建了各自包含超过10^5个克隆的阵列式cDNA文库,并可供学界使用。累积的STC数据和不断扩充的表达序列标签数据库(目前包含超过12,000个序列)已提交至GenBank,并可在公共网站上获取。