Wiemann S, Weil B, Wellenreuther R, Gassenhuber J, Glassl S, Ansorge W, Böcher M, Blöcker H, Bauersachs S, Blum H, Lauber J, Düsterhöft A, Beyer A, Köhrer K, Strack N, Mewes H W, Ottenwälder B, Obermaier B, Tampe J, Heubner D, Wambutt R, Korn B, Klein M, Poustka A
Molecular Genome Analysis, German Cancer Research Center, 69120 Heidelberg, Germany.
Genome Res. 2001 Mar;11(3):422-35. doi: 10.1101/gr.gr1547r.
With the complete human genomic sequence being unraveled, the focus will shift to gene identification and to the functional analysis of gene products. The generation of a set of cDNAs, both sequences and physical clones, which contains the complete and noninterrupted protein coding regions of all human genes will provide the indispensable tools for the systematic and comprehensive analysis of protein function to eventually understand the molecular basis of man. Here we report the sequencing and analysis of 500 novel human cDNAs containing the complete protein coding frame. Assignment to functional categories was possible for 52% (259) of the encoded proteins, the remaining fraction having no similarities with known proteins. By aligning the cDNA sequences with the sequences of the finished chromosomes 21 and 22 we identified a number of genes that either had been completely missed in the analysis of the genomic sequences or had been wrongly predicted. Three of these genes appear to be present in several copies. We conclude that full-length cDNA sequencing continues to be crucial also for the accurate identification of genes. The set of 500 novel cDNAs, and another 1000 full-coding cDNAs of known transcripts we have identified, adds up to cDNA representations covering 2%--5 % of all human genes. We thus substantially contribute to the generation of a gene catalog, consisting of both full-coding cDNA sequences and clones, which should be made freely available and will become an invaluable tool for detailed functional studies.
随着人类基因组全序列的解析,重点将转向基因识别和基因产物的功能分析。一组包含所有人类基因完整且不间断蛋白质编码区的cDNA(包括序列和物理克隆)的产生,将为系统全面地分析蛋白质功能提供不可或缺的工具,最终帮助我们理解人类的分子基础。在此,我们报告了500个包含完整蛋白质编码框架的新人类cDNA的测序与分析结果。对于52%(259个)编码蛋白质,可以确定其功能类别,其余部分与已知蛋白质没有相似性。通过将cDNA序列与已完成测序的21号和22号染色体序列进行比对,我们发现了一些在基因组序列分析中被完全遗漏或预测错误的基因。其中有三个基因似乎有多个拷贝。我们得出结论,全长cDNA测序对于准确识别基因仍然至关重要。我们所确定的这500个新cDNA以及另外1000个已知转录本的全编码cDNA,加起来涵盖了所有人类基因的2% - 5%。因此,我们为构建一个由全编码cDNA序列和克隆组成的基因目录做出了重要贡献,该目录应免费提供,并且将成为详细功能研究的宝贵工具。