Chong Allen, Zhang Guanglan, Bajic Vladimir B
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore.
Genomics. 2004 Oct;84(4):762-6. doi: 10.1016/j.ygeno.2004.05.007.
We present a comprehensive database, Information for the Coordinates of Exons (ICE), of genomic splice sites (SSs) for 10,803 human genes. ICE contains 91,846 pairs of donor acceptor sites, supported by the alignment of "full-length" human mRNAs (including transcript variants) on human genomic sequences. ICE represents the largest collection of human SSs known to date and provides a significant resource to both molecular biologists and bioinformaticians alike. A user can visualize and extract genomic sequences around SSs of the donor acceptor pairs and can also visualize the primary structure of individual genes. We list in this article the 22 most frequently found canonical and noncanonical splice sites. The top four most represented donor acceptor pairs (GT-AG, GC-AG, AT-AC, and GT-GG) accounted for 99.16% of our data set. In addition, we calculated the SS matrix models for the three most common donor acceptor pairs. The database is focused on providing SSs and surrounding sequence information, associated SS and sequence characteristics, and relation to overall transcript structure. It allows targeted search and presents evidence for the gene structure.
我们展示了一个全面的数据库——外显子坐标信息库(ICE),它包含了10803个人类基因的基因组剪接位点(SS)。ICE包含91846对供体-受体位点,这些位点由“全长”人类mRNA(包括转录变体)与人基因组序列的比对所支持。ICE是迄今为止已知的最大的人类SS集合,为分子生物学家和生物信息学家提供了重要资源。用户可以可视化并提取供体-受体对的SS周围的基因组序列,还可以可视化单个基因的一级结构。我们在本文中列出了22个最常见的典型和非典型剪接位点。占比最高的四个供体-受体对(GT-AG、GC-AG、AT-AC和GT-GG)占我们数据集的99.16%。此外,我们计算了三个最常见的供体-受体对的SS矩阵模型。该数据库专注于提供SS及其周围的序列信息、相关的SS和序列特征,以及与整体转录结构的关系。它允许进行有针对性的搜索,并展示基因结构的证据。