Lees Jonathan G, Ranea Juan A, Orengo Christine A
Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, UK.
Department of Molecular Biology and Biochemistry-CIBER de Enfermedades Raras, University of Malaga, Malaga, 29071, Spain.
BMC Genomics. 2015 Aug 16;16(1):608. doi: 10.1186/s12864-015-1674-2.
In complex Metazoans a given gene frequently codes for multiple protein isoforms, through processes such as alternative splicing. Large scale functional annotation of these isoforms is a key challenge for functional genomics. This annotation gap is increasing with the large numbers of multi transcript genes being identified by technologies such as RNASeq. Furthermore attempts to characterise the functions of splicing in an organism are complicated by the difficulty in distinguishing functional isoforms from those produced by splicing errors or transcription noise. Tools to help prioritise candidate isoforms for testing are largely absent.
In this study we implement a Time-course Switch (TS) score for ranking isoforms by their likelihood of producing additional functions based on their developmental expression profiles, as reported by modENCODE. The TS score allows us to better investigate functional roles of different isoforms expressed in multi transcript genes. From this analysis, we find that isoforms with high TS scores have sequence feature changes consistent with more deterministic splicing and functional changes and tend to gain domains or whole exons which could carry additional functions. Furthermore these functions appear to be particularly important for essential regulatory roles, establishing functional isoform switching as key for regulatory processes. Based on the TS score we develop a Transcript Annotations Pipeline for Alternative Splicing (TAPAS) that identifies functional neighbourhoods of potentially interesting isoforms.
We have identified a subset of protein isoforms which appear to have high functional significance, particularly in regulation. This has been made possible through the development of novel methods that make use of transcript expression profiles. The methods and analyses we present here represent important first steps in the development of tools to address the near complete lack of isoform specific function annotation. In turn the tools allow us to better characterise the regulatory functions of alternative splicing in more detail.
在复杂的后生动物中,一个给定的基因常常通过可变剪接等过程编码多种蛋白质异构体。对这些异构体进行大规模功能注释是功能基因组学面临的一项关键挑战。随着RNA测序等技术鉴定出大量多转录本基因,这种注释缺口正在扩大。此外,由于难以区分功能性异构体与由剪接错误或转录噪声产生的异构体,在生物体中表征剪接功能的尝试变得复杂。目前基本上没有帮助对候选异构体进行优先排序以进行测试的工具。
在本研究中,我们基于modENCODE报告的发育表达谱,实施了一个时间进程转换(TS)分数,用于根据异构体产生额外功能的可能性对其进行排名。TS分数使我们能够更好地研究多转录本基因中表达的不同异构体的功能作用。通过该分析,我们发现具有高TS分数的异构体具有与更确定性剪接和功能变化一致的序列特征变化,并且倾向于获得可能携带额外功能的结构域或整个外显子。此外,这些功能对于基本调控作用似乎尤为重要,确立了功能性异构体转换是调控过程的关键。基于TS分数,我们开发了一个用于可变剪接的转录本注释管道(TAPAS),该管道可识别潜在有趣异构体的功能邻域。
我们已经鉴定出一组似乎具有高度功能重要性的蛋白质异构体,特别是在调控方面。这是通过开发利用转录本表达谱的新方法得以实现的。我们在此展示的方法和分析是开发工具以解决几乎完全缺乏异构体特异性功能注释问题的重要第一步。反过来,这些工具使我们能够更详细地更好地表征可变剪接的调控功能。