Nagase T, Ishikawa K, Nakajima D, Ohira M, Seki N, Miyajima N, Tanaka A, Kotani H, Nomura N, Ohara O
Kazusa DNA Research Institute, Chiba, Japan.
DNA Res. 1997 Apr 28;4(2):141-50. doi: 10.1093/dnares/4.2.141.
In this series of projects of sequencing human cDNA clones which correspond to relatively long transcripts, we newly determined the entire sequences of 100 cDNA clones which were screened on the basis of the potentiality of coding for large proteins in vitro. The cDNA libraries used were the fractions with average insert sizes from 5.3 to 7.0 kb of the size-fractionated cDNA libraries from human brain. The randomly sampled clones were single-pass sequenced from both the ends to select clones that are not registered in the public database. Then their protein-coding potentialities were examined by an in vitro transcription/translation system, and the clones that generated proteins larger than 60 kDa were entirely sequenced. Each clone gave a distinct open reading frame (ORF), and the length of the ORF was roughly coincident with the approximate molecular mass of the in vitro product estimated from its mobility on SDS-polyacrylamide gel electrophoresis. The average size of the cDNA clones sequenced was 6.1 kb, and that of the ORFs corresponded to 1200 amino acid residues. By computer-assisted analysis of the sequences with DNA and protein-motif databases (GenBank and PROSITE databases), the functions of at least 73% of the gene products could be anticipated, and 88% of them (the products of 64 clones) were assigned to the functional categories of proteins relating to cell signaling/communication, nucleic acid managing, and cell structure/motility. The expression profiles in a variety of tissues and chromosomal locations of the sequenced clones have been determined. According to the expression spectra, approximately 11 genes appeared to be predominantly expressed in brain. Most of the remaining genes were categorized into one of the following classes: either the expression occurs in a limited number of tissues (31 genes) or the expression occurs ubiquitously in all but a few tissues (47 genes).
在这一系列对与相对较长转录本相对应的人类cDNA克隆进行测序的项目中,我们新确定了100个cDNA克隆的完整序列,这些克隆是根据其在体外编码大蛋白的潜力筛选出来的。所使用的cDNA文库是来自人类大脑的大小分级cDNA文库中平均插入片段大小为5.3至7.0 kb的部分。对随机抽样的克隆从两端进行单通道测序,以选择未在公共数据库中登记的克隆。然后通过体外转录/翻译系统检测它们的蛋白质编码潜力,对产生大于60 kDa蛋白质的克隆进行全序列测定。每个克隆都给出了一个独特的开放阅读框(ORF),ORF的长度与根据其在SDS-聚丙烯酰胺凝胶电泳上的迁移率估计的体外产物的近似分子量大致相符。测序的cDNA克隆的平均大小为6.1 kb,ORF的平均大小对应于1200个氨基酸残基。通过利用DNA和蛋白质基序数据库(GenBank和PROSITE数据库)对序列进行计算机辅助分析,可以预测至少73%的基因产物的功能,其中88%(64个克隆的产物)被归入与细胞信号传导/通讯、核酸管理以及细胞结构/运动相关的蛋白质功能类别。已经确定了测序克隆在各种组织中的表达谱及其染色体定位。根据表达谱,大约有11个基因似乎主要在大脑中表达。其余大多数基因可分为以下几类之一:要么在有限数量的组织中表达(31个基因),要么在除少数组织外的所有组织中普遍表达(47个基因)。