Adams M D, Kerlavage A R, Fleischmann R D, Fuldner R A, Bult C J, Lee N H, Kirkness E F, Weinstock K G, Gocayne J D, White O
Institute for Genomic Research, Rockville, Maryland 20850, USA.
Nature. 1995 Sep 28;377(6547 Suppl):3-174.
In an effort to identify new genes and analyse their expression patterns, 174,472 partial complementary DNA sequences (expressed sequence tags (ESTs)), totalling more than 52 million nucleotides of human DNA sequence, have been generated from 300 cDNA libraries constructed from 37 distinct organs and tissues. These ESTs have been combined with an additional 118,406 ESTs from the database dbEST, for a total of 83 million nucleotides, and treated as a shotgun sequence assembly project. The assembly process yielded 29,599 distinct tentative human consensus (THC) sequences and 58,384 non-overlapping ESTs. Of these 87,983 distinct sequences, 10,214 further characterize previously known genes based on statistically significant similarity to sequences in the available databases; the remainder identify previously unknown genes. Thirty tissues were sampled by over 1,000 ESTs each; only eight genes were matched by ESTs from all 30 tissues, and 227 genes were represented in 20 or more of the tissues sampled with more than 1,000 ESTs. Approximately 40% of identified human genes appear to be associated with basic energy metabolism, cell structure, homeostasis and cell division, 22% with RNA and protein synthesis and processing, and 12% with cell signalling and communication.
为了鉴定新基因并分析其表达模式,已从由37种不同器官和组织构建的300个cDNA文库中生成了174,472个部分互补DNA序列(表达序列标签(EST)),其人类DNA序列总计超过5200万个核苷酸。这些EST已与来自数据库dbEST的另外118,406个EST合并,总共8300万个核苷酸,并被视为一个鸟枪法序列组装项目。组装过程产生了29,599个不同的暂定人类共有序列(THC)和58,384个不重叠的EST。在这87,983个不同序列中,有10,214个基于与现有数据库中序列的统计学显著相似性进一步表征了先前已知的基因;其余的则鉴定出先前未知的基因。30个组织每个都被超过1000个EST取样;只有8个基因与所有30个组织的EST匹配,并且227个基因在20个或更多被超过1000个EST取样的组织中出现。大约40%的已鉴定人类基因似乎与基本能量代谢、细胞结构、体内平衡和细胞分裂相关,22%与RNA和蛋白质合成及加工相关,12%与细胞信号传导和通讯相关。