Suppr超能文献

对人类基因组的基因索引分析估计约有120000个基因。

Gene index analysis of the human genome estimates approximately 120,000 genes.

作者信息

Liang F, Holt I, Pertea G, Karamycheva S, Salzberg S L, Quackenbush J

机构信息

The Institute for Genomic Research, Rockville, Maryland, USA.

出版信息

Nat Genet. 2000 Jun;25(2):239-40. doi: 10.1038/76126.

Abstract

Although sequencing of the human genome will soon be completed, gene identification and annotation remains a challenge. Early estimates suggested that there might be 60,000-100,000 (ref. 1) human genes, but recent analyses of the available data from EST sequencing projects have estimated as few as 45,000 (ref. 2) or as many as 140, 000 (ref. 3) distinct genes. The Chromosome 22 Sequencing Consortium estimated a minimum of 45,000 genes based on their annotation of the complete chromosome, although their data suggests there may be additional genes. The nearly 2,000,000 human ESTs in dbEST provide an important resource for gene identification and genome annotation, but these single-pass sequences must be carefully analysed to remove contaminating sequences, including those from genomic DNA, spurious transcription, and vector and bacterial sequences. We have developed a highly refined and rigorously tested protocol for cleaning, clustering and assembling EST sequences to produce high-fidelity consensus sequences for the represented genes (F.L. et al., manuscript submitted) and used this to create the TIGR Gene Indices-databases of expressed genes for human, mouse, rat and other species (http://www.tigr.org/tdb/tgi.html). Using highly refined and tested algorithms for EST analysis, we have arrived at two independent estimates indicating the human genome contains approximately 120,000 genes.

摘要

尽管人类基因组测序即将完成,但基因识别和注释仍然是一项挑战。早期估计表明,人类基因可能有60000 - 100000个(参考文献1),但最近对EST测序项目现有数据的分析估计,不同基因少至45000个(参考文献2),多至140000个(参考文献3)。22号染色体测序联盟根据对完整染色体的注释估计至少有45000个基因,尽管他们的数据表明可能还有其他基因。dbEST中近200万个人类EST为基因识别和基因组注释提供了重要资源,但这些单通道序列必须经过仔细分析,以去除污染序列,包括来自基因组DNA、假转录本以及载体和细菌的序列。我们已经开发出一种高度精细且经过严格测试的方案,用于清理、聚类和组装EST序列,以生成所代表基因的高保真共有序列(F.L.等人,待发表手稿),并以此创建了TIGR基因索引——人类、小鼠、大鼠和其他物种的表达基因数据库(http://www.tigr.org/tdb/tgi.html)。通过使用高度精细且经过测试的EST分析算法,我们得出了两个独立的估计结果,表明人类基因组包含约120000个基因。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验