Nordström Karl J V, Mirza Majd A I, Larsson Thomas P, Gloriam David E I, Fredriksson Robert, Schiöth Helgi B
Department of Neuroscience, Uppsala University, BMC Box 593, 751 24 Uppsala, Sweden.
Biochem Biophys Res Commun. 2006 Sep 29;348(3):1063-74. doi: 10.1016/j.bbrc.2006.07.153. Epub 2006 Aug 4.
Our understanding of functional genetic elements in the genomes is continuously growing and new entries are entered in various databases on a regular basis. We have here merged the genetic elements in RefSeq, Ensembl, FANTOM3, HINV, and NCBI:s ESTdb using the genome assemblies in order to achieve a comprehensive picture of the current status of the identity and gene number in human, mouse, and rat. The number of human protein coding genes has not increased (25,043) while the increased sequencing of mouse transcripts has provided the considerably higher number of protein coding genes (31,578) in mouse. The results indicate large discrepancies between the datasets, as considerable numbers of unique transcripts can be found in each dataset. Despite the high number of ncRNA (38,129 in mouse) there are also almost 20,000 EST clusters in both mouse and humans with more than one EST that do not overlap any transcript suggesting that several new genetic elements are still to be found. We also demonstrated presence of new genes by identifying new human ones that have specific tissue profiles, using RT-PCR on rat tissues.
我们对基因组中功能基因元件的理解在不断加深,各种数据库也在定期录入新的数据。我们在此利用基因组组装,合并了RefSeq、Ensembl、FANTOM3、HINV和NCBI的ESTdb中的基因元件,以全面了解人类、小鼠和大鼠中基因身份和基因数量的当前状况。人类蛋白质编码基因的数量没有增加(25,043个),而小鼠转录本测序的增加使得小鼠中蛋白质编码基因的数量大幅增加(31,578个)。结果表明各数据集之间存在很大差异,因为在每个数据集中都能发现相当数量的独特转录本。尽管非编码RNA数量众多(小鼠中有38,129个),但在小鼠和人类中都有近20,000个EST簇,其中有多个EST不与任何转录本重叠,这表明仍有几个新的基因元件有待发现。我们还通过在大鼠组织上进行RT-PCR,鉴定出具有特定组织谱的新人类基因,从而证明了新基因的存在。