Ferreira Elisa N, Pires Lilian C, Parmigiani Raphael B, Bettoni Fabiana, Puga Renato D, Pinheiro Daniel G, Andrade Luís Eduardo C, Cruz Luciana O, Degaki Theri L, Faria Milton, Festa Fernanda, Giannella-Neto Daniel, Giorgi Ricardo R, Goldman Gustavo H, Granja Fabiana, Gruber Arthur, Hackel Christine, Henrique-Silva Flávio, Malnic Bettina, Manzini Carina V B, Marie Suely K N, Martinez-Rossi Nilce M, Oba-Shinjo Sueli M, Pardini Maria Ines M C, Rahal Paula, Rainho Cláudia A, Rogatto Silvia R, Romano Camila M, Rodrigues Vanderlei, Sales Magaly M, Savoldi Marcela, da Silva Ismael D C G, da Silva Neusa P, de Souza Sandro J, Tajara Eloiza H, Silva Wilson A, Simpson Andrew J G, Sogayar Mari C, Camargo Anamaria A, Carraro Dirce M
Laboratory of Molecular Biology and Genomics, Ludwig Institute for Cancer Research, São Paulo, SP, Brazil.
Genet Mol Res. 2004 Dec 30;3(4):493-511.
The correct identification of all human genes, and their derived transcripts, has not yet been achieved, and it remains one of the major aims of the worldwide genomics community. Computational programs suggest the existence of 30,000 to 40,000 human genes. However, definitive gene identification can only be achieved by experimental approaches. We used two distinct methodologies, one based on the alignment of mouse orthologous sequences to the human genome, and another based on the construction of a high-quality human testis cDNA library, in an attempt to identify new human transcripts within the human genome sequence. We generated 47 complete human transcript sequences, comprising 27 unannotated and 20 annotated sequences. Eight of these transcripts are variants of previously known genes. These transcripts were characterized according to size, number of exons, and chromosomal localization, and a search for protein domains was undertaken based on their putative open reading frames. In silico expression analysis suggests that some of these transcripts are expressed at low levels and in a restricted set of tissues.
目前尚未实现对所有人类基因及其衍生转录本的正确识别,这仍是全球基因组学界的主要目标之一。计算程序表明人类基因的数量在3万至4万之间。然而,只有通过实验方法才能实现对基因的明确识别。我们使用了两种不同的方法,一种是基于小鼠直系同源序列与人基因组的比对,另一种是基于构建高质量的人类睾丸cDNA文库,试图在人类基因组序列中识别新的人类转录本。我们生成了47条完整的人类转录本序列,其中包括27条未注释和20条已注释的序列。这些转录本中有8条是先前已知基因的变体。根据大小、外显子数量和染色体定位对这些转录本进行了表征,并基于其推定的开放阅读框对蛋白质结构域进行了搜索。电子表达分析表明,其中一些转录本表达水平较低,且仅在一组受限的组织中表达。