Sakharkar Meena Kishore, Kangueane Pandjassarame
Nanyang Centre for Supercomputing and Visualization, School of Mechanical and Production Engineering, Nanyang Technological University, Singapore 639798.
BMC Bioinformatics. 2004 Jun 2;5:67. doi: 10.1186/1471-2105-5-67.
A number of completely sequenced eukaryotic genome data are available in the public domain. Eukaryotic genes are either 'intron containing' or 'intronless'. Eukaryotic 'intronless' genes are interesting datasets for comparative genomics and evolutionary studies. The SEGE database containing a collection of eukaryotic single exon genes is available. However, SEGE is derived using GenBank. The redundant, incomplete and heterogeneous qualities of GenBank data are a bottleneck for biological investigation in comparative genomics and evolutionary studies. Such studies often require representative gene sets from each genome and this is possible only by deriving specific datasets from completely sequenced genome data. Thus Genome SEGE, a database for 'intronless' genes in completely sequenced eukaryotic genomes, has been constructed.
http://sege.ntu.edu.sg/wester/intronless
Eukaryotic 'intronless' genes are extracted from nine completely sequenced genomes (four of which are unicellular and five of which are multi-cellular). The complete dataset is available for download. Data subsets are also available for 'intronless' pseudo-genes. The database provides information on the distribution of 'intronless' genes in different genomes together with their length distributions in each genome. Additionally, the search tool provides pre-computed PROSITE motifs for each sequence in the database with appropriate hyperlinks to InterPro. A search facility is also available through the web server.
The unique features that distinguish Genome SEGE from SEGE is the service providing representative 'intronless' datasets for completely sequenced genomes. 'Intronless' gene sets available in this database will be of use for subsequent bio-computational analysis in comparative genomics and evolutionary studies. Such analysis may help to revisit the original genome data for re-examination and re-annotation.
公共领域中有许多已完成全序列测定的真核生物基因组数据。真核基因可分为“含内含子”或“无内含子”两类。真核生物的“无内含子”基因是比较基因组学和进化研究中有趣的数据集。目前有一个包含真核生物单外显子基因集合的SEGE数据库。然而,SEGE是基于GenBank构建的。GenBank数据的冗余、不完整和异质性是比较基因组学和进化研究中生物学调查的一个瓶颈。此类研究通常需要每个基因组的代表性基因集,而这只有通过从全序列测定的基因组数据中获取特定数据集才能实现。因此,构建了Genome SEGE,一个用于存储已完成全序列测定的真核生物基因组中“无内含子”基因的数据库。
http://sege.ntu.edu.sg/wester/intronless
真核生物的“无内含子”基因是从九个已完成全序列测定的基因组中提取的(其中四个是单细胞基因组,五个是多细胞基因组)。完整数据集可供下载。还提供了“无内含子”假基因的数据子集。该数据库提供了不同基因组中“无内含子”基因的分布信息以及它们在每个基因组中的长度分布。此外,搜索工具为数据库中的每个序列提供了预先计算的PROSITE基序,并带有指向InterPro的适当超链接。也可通过网络服务器进行搜索。
Genome SEGE与SEGE的独特区别在于,它为已完成全序列测定的基因组提供代表性的“无内含子”数据集服务。该数据库中可用的“无内含子”基因集将用于比较基因组学和进化研究中的后续生物计算分析。此类分析可能有助于重新审视原始基因组数据以进行重新检查和重新注释。