Li Hong-Dong, Omenn Gilbert S, Guan Yuanfang
Brief Bioinform. 2016 Nov;17(6):1024-1031. doi: 10.1093/bib/bbv109. Epub 2016 Jan 6.
The products of multi-exon genes are a mixture of alternatively spliced isoforms, from which the translated proteins can have similar, different or even opposing functions. It is therefore essential to differentiate and annotate functions for individual isoforms. Computational approaches provide an efficient complement to expensive and time-consuming experimental studies. The input data of these methods range from DNA sequence, to RNA selection pressure, to expressed sequence tags, to full-length complementary DNA, to exon array, to RNA-seq expression, to proteomic data. Notably, RNA-seq technology generates quantitative profiling of transcript expression at the genome scale, with an unprecedented amount of expression data available for developing isoform function prediction methods. Integrative analysis of these data at different molecular levels enables a proteogenomic approach to systematically interrogate isoform functions. Here, we briefly review the state-of-the-art methods according to their input data sources, discuss their advantages and limitations and point out potential ways to improve prediction accuracies.
多外显子基因的产物是可变剪接异构体的混合物,由此翻译出的蛋白质可能具有相似、不同甚至相反的功能。因此,区分并注释各个异构体的功能至关重要。计算方法为昂贵且耗时的实验研究提供了有效的补充。这些方法的输入数据范围从DNA序列、RNA选择压力、表达序列标签、全长互补DNA、外显子阵列、RNA-seq表达,到蛋白质组学数据。值得注意的是,RNA-seq技术能在基因组规模上生成转录本表达的定量分析,为开发异构体功能预测方法提供了前所未有的大量表达数据。在不同分子水平对这些数据进行综合分析,能够采用蛋白质基因组学方法系统地探究异构体功能。在此,我们根据其输入数据源简要回顾最先进的方法,讨论其优点和局限性,并指出提高预测准确性的潜在方法。