Suppr超能文献

通过整合 RNA-seq 数据,系统地区分可变剪接异构体的功能。

Systematically differentiating functions for alternatively spliced isoforms through integrating RNA-seq data.

机构信息

Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America.

出版信息

PLoS Comput Biol. 2013;9(11):e1003314. doi: 10.1371/journal.pcbi.1003314. Epub 2013 Nov 7.

Abstract

Integrating large-scale functional genomic data has significantly accelerated our understanding of gene functions. However, no algorithm has been developed to differentiate functions for isoforms of the same gene using high-throughput genomic data. This is because standard supervised learning requires 'ground-truth' functional annotations, which are lacking at the isoform level. To address this challenge, we developed a generic framework that interrogates public RNA-seq data at the transcript level to differentiate functions for alternatively spliced isoforms. For a specific function, our algorithm identifies the 'responsible' isoform(s) of a gene and generates classifying models at the isoform level instead of at the gene level. Through cross-validation, we demonstrated that our algorithm is effective in assigning functions to genes, especially the ones with multiple isoforms, and robust to gene expression levels and removal of homologous gene pairs. We identified genes in the mouse whose isoforms are predicted to have disparate functionalities and experimentally validated the 'responsible' isoforms using data from mammary tissue. With protein structure modeling and experimental evidence, we further validated the predicted isoform functional differences for the genes Cdkn2a and Anxa6. Our generic framework is the first to predict and differentiate functions for alternatively spliced isoforms, instead of genes, using genomic data. It is extendable to any base machine learner and other species with alternatively spliced isoforms, and shifts the current gene-centered function prediction to isoform-level predictions.

摘要

整合大规模功能基因组数据极大地加速了我们对基因功能的理解。然而,目前还没有开发出算法来使用高通量基因组数据区分同一基因的异构体的功能。这是因为标准的监督学习需要“真实功能”注释,而在异构体水平上缺乏这种注释。为了解决这个挑战,我们开发了一种通用框架,该框架在转录本水平上利用公共 RNA-seq 数据来区分可变剪接异构体的功能。对于特定功能,我们的算法确定基因的“负责”异构体,并在异构体水平而不是基因水平上生成分类模型。通过交叉验证,我们证明了我们的算法能够有效地为基因分配功能,特别是那些具有多个异构体的基因,并且对基因表达水平和同源基因对的去除具有鲁棒性。我们鉴定了老鼠中的基因,其异构体被预测具有不同的功能,并使用乳腺组织中的数据实验验证了“负责”异构体。通过蛋白质结构建模和实验证据,我们进一步验证了基因 Cdkn2a 和 Anxa6 的预测异构体功能差异。我们的通用框架是第一个使用基因组数据预测和区分替代剪接异构体功能的框架,而不是基因。它可以扩展到任何基于机器的学习者和具有替代剪接异构体的其他物种,并将当前基于基因的功能预测转移到异构体水平预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45ff/3820534/caab88f23c5c/pcbi.1003314.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验