Kan Zhengyan, States David, Gish Warren
Department of Genetics, Washington University, St. Louis, Missouri 63110, USA.
Genome Res. 2002 Dec;12(12):1837-45. doi: 10.1101/gr.764102.
The expressed sequence tag (EST) collection in dbEST provides an extensive resource for detecting alternative splicing on a genomic scale. Using genomically aligned ESTs, a computational tool (TAP) was used to identify alternative splice patterns for 6400 known human genes from the RefSeq database. With sufficient EST coverage, one or more alternatively spliced forms could be detected for nearly all genes examined. To identify high (>95%) confidence observations of alternative splicing, splice variants were clustered on the basis of having mutually exclusive structures, and sample statistics were then applied. Through this selection, alternative splices expected at a frequency of >5% within their respective clusters were seen for only 17%-28% of genes. Although intron retention events (potentially unspliced messages) had been seen for 36% of the genes overall, the same statistical selection yielded reliable cases of intron retention for <5% of genes. For high-confidence alternative splices in the human ESTs, we also noted significantly higher rates both of cross-species conservation in mouse ESTs and of validation in the GenBank mRNA collection. We suggest quantitative analytical approaches such as these can aid in selecting useful targets for further experimental characterization and in so doing may help elucidate the mechanisms and biological implications of alternative splicing.
dbEST中的表达序列标签(EST)集合为在基因组规模上检测可变剪接提供了丰富的资源。利用与基因组比对的EST,使用一种计算工具(TAP)从RefSeq数据库中识别6400个已知人类基因的可变剪接模式。在有足够EST覆盖的情况下,几乎所有检测的基因都能检测到一种或多种可变剪接形式。为了识别可变剪接的高(>95%)可信度观察结果,根据互斥结构对剪接变体进行聚类,然后应用样本统计。通过这种筛选,在各自聚类中预期频率>5%的可变剪接仅在17%-28%的基因中出现。尽管总体上36%的基因存在内含子保留事件(潜在的未剪接信息),但相同的统计筛选仅在<5%的基因中产生可靠的内含子保留案例。对于人类EST中的高可信度可变剪接,我们还注意到在小鼠EST中的跨物种保守率以及在GenBank mRNA集合中的验证率都显著更高。我们建议,诸如此类的定量分析方法有助于选择有用的靶点进行进一步的实验表征,进而可能有助于阐明可变剪接的机制和生物学意义。