Gu Z, Hillier L, Kwok P Y
Division of Dermatology, Washington University School of Medicine, St. Louis, Missouri 63110, USA.
Hum Mutat. 1998;12(4):221-5. doi: 10.1002/(SICI)1098-1004(1998)12:4<221::AID-HUMU1>3.0.CO;2-I.
Large-scale sequencing of human cDNA and genomic DNA libraries has produced a large collection of sequence data in public databases. To date, >900,000 human expressed sequence tag (EST) sequences and >80,000,000 bases of genomic DNA sequence have been deposited in Genbank. This ever-expanding data set is a rich source of gene-associated and anonymous single nucleotide polymorphisms (SNPs). DNA sequence variations can be found by comparing the sequences of redundant ESTs and by comparing sequences from overlapping genomic clones. Initial studies have shown that, with proper computer screening, informative SNP markers can be developed from these DNA databases in an efficient and cost-effective manner. Complete public access to these databases will allow individual investigators to add biological value to the human sequence data generated by large-scale sequencing centers.
对人类cDNA和基因组DNA文库进行大规模测序,已在公共数据库中产生了大量的序列数据。迄今为止,超过90万个人类表达序列标签(EST)序列和超过8000万个基因组DNA序列碱基已存入Genbank。这个不断扩展的数据集是与基因相关的和匿名单核苷酸多态性(SNP)的丰富来源。通过比较冗余EST的序列以及比较重叠基因组克隆的序列,可以发现DNA序列变异。初步研究表明,通过适当的计算机筛选,可以以高效且经济高效的方式从这些DNA数据库中开发出信息丰富的SNP标记。对这些数据库的完全公共访问将使个体研究人员能够为大规模测序中心生成的人类序列数据增添生物学价值。