Suppr超能文献

开放阅读框序列标签(OSTs)支持秀丽隐杆线虫中至少存在17300个基因。

Open-reading-frame sequence tags (OSTs) support the existence of at least 17,300 genes in C. elegans.

作者信息

Reboul J, Vaglio P, Tzellas N, Thierry-Mieg N, Moore T, Jackson C, Shin-i T, Kohara Y, Thierry-Mieg D, Thierry-Mieg J, Lee H, Hitti J, Doucette-Stamm L, Hartley J L, Temple G F, Brasch M A, Vandenhaute J, Lamesch P E, Hill D E, Vidal M

机构信息

Dana-Farber Cancer Institute and Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.

出版信息

Nat Genet. 2001 Mar;27(3):332-6. doi: 10.1038/85913.

Abstract

The genome sequences of Caenorhabditis elegans, Drosophila melanogaster and Arabidopsis thaliana have been predicted to contain 19,000, 13,600 and 25,500 genes, respectively. Before this information can be fully used for evolutionary and functional studies, several issues need to be addressed. First, the gene number estimates obtained in silico and not yet supported by any experimental data need to be verified. For example, it seems biologically paradoxical that C. elegans would have 50% more genes than Drosophilia. Second, intron/exon predictions need to be tested experimentally. Third, complete sets of open reading frames (ORFs), or "ORFeomes," need to be cloned into various expression vectors. To address these issues simultaneously, we have designed and applied to C. elegans the following strategy. Predicted ORFs are amplified by PCR from a highly representative cDNA library using ORF-specific primers, cloned by Gateway recombination cloning and then sequenced to generate ORF sequence tags (OSTs) as a way to verify identity and splicing. In a sample (n=1,222) of the nearly 10,000 genes predicted ab initio (that is, for which no expressed sequence tag (EST) is available so far), at least 70% were verified by OSTs. We also observed that 27% of these experimentally confirmed genes have a structure different from that predicted by GeneFinder. We now have experimental evidence that supports the existence of at least 17,300 genes in C. elegans. Hence we suggest that gene counts based primarily on ESTs may underestimate the number of genes in human and in other organisms.

摘要

秀丽隐杆线虫、黑腹果蝇和拟南芥的基因组序列预计分别包含19000个、13600个和25500个基因。在这些信息能够充分用于进化和功能研究之前,需要解决几个问题。首先,通过计算机模拟获得且尚未得到任何实验数据支持的基因数量估计值需要进行验证。例如,秀丽隐杆线虫的基因数量比果蝇多50%,这在生物学上似乎自相矛盾。其次,内含子/外显子预测需要通过实验进行检验。第三,完整的开放阅读框(ORF)集,即“ORFeome”,需要克隆到各种表达载体中。为了同时解决这些问题,我们设计并应用于秀丽隐杆线虫以下策略。使用ORF特异性引物从高度代表性的cDNA文库中通过PCR扩增预测的ORF,通过Gateway重组克隆进行克隆,然后测序以生成ORF序列标签(OST),作为验证身份和剪接的一种方法。在近10000个从头预测的基因样本(n = 1222)中(即目前尚无表达序列标签(EST)的基因),至少70%通过OST得到验证。我们还观察到,这些经实验证实基因中有27%的结构与GeneFinder预测的不同。我们现在有实验证据支持秀丽隐杆线虫中至少存在17300个基因。因此,我们认为主要基于EST的基因计数可能低估了人类和其他生物中的基因数量。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验