Sammeth Michael, Foissac Sylvain, Guigó Roderic
Centre de Regulació Genòmica, Barcelona, Spain.
PLoS Comput Biol. 2008 Aug 8;4(8):e1000147. doi: 10.1371/journal.pcbi.1000147.
Understanding the molecular mechanisms responsible for the regulation of the transcriptome present in eukaryotic cells is one of the most challenging tasks in the postgenomic era. In this regard, alternative splicing (AS) is a key phenomenon contributing to the production of different mature transcripts from the same primary RNA sequence. As a plethora of different transcript forms is available in databases, a first step to uncover the biology that drives AS is to identify the different types of reflected splicing variation. In this work, we present a general definition of the AS event along with a notation system that involves the relative positions of the splice sites. This nomenclature univocally and dynamically assigns a specific "AS code" to every possible pattern of splicing variation. On the basis of this definition and the corresponding codes, we have developed a computational tool (AStalavista) that automatically characterizes the complete landscape of AS events in a given transcript annotation of a genome, thus providing a platform to investigate the transcriptome diversity across genes, chromosomes, and species. Our analysis reveals that a substantial part--in human more than a quarter-of the observed splicing variations are ignored in common classification pipelines. We have used AStalavista to investigate and to compare the AS landscape of different reference annotation sets in human and in other metazoan species and found that proportions of AS events change substantially depending on the annotation protocol, species-specific attributes, and coding constraints acting on the transcripts. The AStalavista system therefore provides a general framework to conduct specific studies investigating the occurrence, impact, and regulation of AS.
了解负责调控真核细胞中转录组的分子机制是后基因组时代最具挑战性的任务之一。在这方面,可变剪接(AS)是一个关键现象,它有助于从相同的初级RNA序列产生不同的成熟转录本。由于数据库中存在大量不同的转录本形式,揭示驱动AS的生物学机制的第一步是识别不同类型的反映剪接变异。在这项工作中,我们提出了AS事件的一般定义以及一个符号系统,该系统涉及剪接位点的相对位置。这种命名法唯一且动态地为每种可能的剪接变异模式分配一个特定的“AS代码”。基于这个定义和相应的代码,我们开发了一个计算工具(AStalavista),它可以自动表征基因组给定转录本注释中AS事件的完整情况,从而提供一个平台来研究跨基因、染色体和物种的转录组多样性。我们的分析表明,在常见的分类流程中,很大一部分——在人类中超过四分之一——观察到的剪接变异被忽略了。我们使用AStalavista来研究和比较人类及其他后生动物物种中不同参考注释集的AS情况,发现AS事件的比例会根据注释协议、物种特异性属性以及作用于转录本的编码限制而发生显著变化。因此,AStalavista系统提供了一个通用框架,用于开展关于AS的发生、影响和调控的具体研究。