Camargo A A, Samaia H P, Dias-Neto E, Simão D F, Migotto I A, Briones M R, Costa F F, Nagai M A, Verjovski-Almeida S, Zago M A, Andrade L E, Carrer H, El-Dorry H F, Espreafico E M, Habr-Gama A, Giannella-Neto D, Goldman G H, Gruber A, Hackel C, Kimura E T, Maciel R M, Marie S K, Martins E A, Nobrega M P, Paco-Larson M L, Pardini M I, Pereira G G, Pesquero J B, Rodrigues V, Rogatto S R, da Silva I D, Sogayar M C, Sonati M F, Tajara E H, Valentini S R, Alberto F L, Amaral M E, Aneas I, Arnaldi L A, de Assis A M, Bengtson M H, Bergamo N A, Bombonato V, de Camargo M E, Canevari R A, Carraro D M, Cerutti J M, Correa M L, Correa R F, Costa M C, Curcio C, Hokama P O, Ferreira A J, Furuzawa G K, Gushiken T, Ho P L, Kimura E, Krieger J E, Leite L C, Majumder P, Marins M, Marques E R, Melo A S, Melo M B, Mestriner C A, Miracca E C, Miranda D C, Nascimento A L, Nobrega F G, Ojopi E P, Pandolfi J R, Pessoa L G, Prevedel A C, Rahal P, Rainho C A, Reis E M, Ribeiro M L, da Ros N, de Sa R G, Sales M M, Sant'anna S C, dos Santos M L, da Silva A M, da Silva N P, Silva W A, da Silveira R A, Sousa J F, Stecconi D, Tsukumo F, Valente V, Soares F, Moreira E S, Nunes D N, Correa R G, Zalcberg H, Carvalho A F, Reis L F, Brentani R R, Simpson A J, de Souza S J
Ludwig Institute for Cancer Research, 01509-010, São Paulo, Brazil.
Proc Natl Acad Sci U S A. 2001 Oct 9;98(21):12103-8. doi: 10.1073/pnas.201182798.
Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.
开放阅读框表达序列标签(ORESTES)与传统的EST不同,它提供的是转录本中央蛋白质编码部分的序列数据。我们从24种人体组织中总共生成了696,745条ORESTES序列,并使用了与一组15,095个全长mRNA相对应的数据子集,以此来评估该策略的效率及其对人类转录组定义的潜在贡献。我们估计ORESTES涵盖了超过80%的高表达和中等表达的人类基因,以及40%至50%的低表达人类基因。在我们测序最全面的组织——乳腺中,所生成的130,000条ORESTES来自该组织中估计70%的所有表达基因的转录本,高表达和低表达基因都得到了同样有效的体现。在这方面,我们发现ORESTES策略在基因发现和随机转录本序列生成方面的能力显著超过传统的EST。ORESTES的分布情况使得许多人类转录本现在都由沿着每个基因产物长度分布的部分序列支架所代表。通过逆转录PCR对支架组件进行实验性连接,是转录本完成的直接途径,这可能是全长cDNA克隆的一种有用替代方法。