Li Wenyuan, Kang Shuli, Liu Chun-Chi, Zhang Shihua, Shi Yi, Liu Yan, Zhou Xianghong Jasmine
Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA, Institute of Genomics and Bioinformatics, National Chung Hsing University, Taiwan 40227, Republic of China, National Center for Mathematics and Interdisciplinary Sciences, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China and Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA.
Nucleic Acids Res. 2014 Apr;42(6):e39. doi: 10.1093/nar/gkt1362. Epub 2013 Dec 25.
Alternative transcript processing is an important mechanism for generating functional diversity in genes. However, little is known about the precise functions of individual isoforms. In fact, proteins (translated from transcript isoforms), not genes, are the function carriers. By integrating multiple human RNA-seq data sets, we carried out the first systematic prediction of isoform functions, enabling high-resolution functional annotation of human transcriptome. Unlike gene function prediction, isoform function prediction faces a unique challenge: the lack of the training data--all known functional annotations are at the gene level. To address this challenge, we modelled the gene-isoform relationships as multiple instance data and developed a novel label propagation method to predict functions. Our method achieved an average area under the receiver operating characteristic curve of 0.67 and assigned functions to 15 572 isoforms. Interestingly, we observed that different functions have different sensitivities to alternative isoform processing, and that the function diversity of isoforms from the same gene is positively correlated with their tissue expression diversity. Finally, we surveyed the literature to validate our predictions for a number of apoptotic genes. Strikingly, for the famous 'TP53' gene, we not only accurately identified the apoptosis regulation function of its five isoforms, but also correctly predicted the precise direction of the regulation.
可变转录本加工是在基因中产生功能多样性的重要机制。然而,对于单个异构体的精确功能却知之甚少。事实上,蛋白质(由转录本异构体翻译而来)而非基因才是功能的载体。通过整合多个人类RNA测序数据集,我们首次对异构体功能进行了系统预测,实现了对人类转录组的高分辨率功能注释。与基因功能预测不同,异构体功能预测面临一个独特的挑战:缺乏训练数据——所有已知的功能注释都在基因层面。为应对这一挑战,我们将基因 - 异构体关系建模为多实例数据,并开发了一种新颖的标签传播方法来预测功能。我们的方法在受试者工作特征曲线下的平均面积达到了0.67,并为15572个异构体赋予了功能。有趣的是,我们观察到不同功能对可变异构体加工具有不同的敏感性,并且来自同一基因 的异构体的功能多样性与其组织表达多样性呈正相关。最后,我们查阅文献以验证我们对一些凋亡基因的预测。引人注目的是,对于著名的“TP53”基因,我们不仅准确识别了其五种异构体的凋亡调控功能,还正确预测了调控的精确方向。