Suppr超能文献

利用协同矩阵分解区分异构体功能。

Differentiating isoform functions with collaborative matrix factorization.

机构信息

College of Computer and Information Science, Southwest University, Chongqing 400715, China.

Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.

出版信息

Bioinformatics. 2020 Mar 1;36(6):1864-1871. doi: 10.1093/bioinformatics/btz847.

Abstract

MOTIVATION

Isoforms are alternatively spliced mRNAs of genes. They can be translated into different functional proteoforms, and thus greatly increase the functional diversity of protein variants (or proteoforms). Differentiating the functions of isoforms (or proteoforms) helps understanding the underlying pathology of various complex diseases at a deeper granularity. Since existing functional genomic databases uniformly record the annotations at the gene-level, and rarely record the annotations at the isoform-level, differentiating isoform functions is more challenging than the traditional gene-level function prediction.

RESULTS

Several approaches have been proposed to differentiate the functions of isoforms. They generally follow the multi-instance learning paradigm by viewing each gene as a bag and the spliced isoforms as its instances, and push functions of bags onto instances. These approaches implicitly assume the collected annotations of genes are complete and only integrate multiple RNA-seq datasets. As such, they have compromised performance. We propose a data integrative solution (called DisoFun) to Differentiate isoform Functions with collaborative matrix factorization. DisoFun assumes the functional annotations of genes are aggregated from those of key isoforms. It collaboratively factorizes the isoform data matrix and gene-term data matrix (storing Gene Ontology annotations of genes) into low-rank matrices to simultaneously explore the latent key isoforms, and achieve function prediction by aggregating predictions to their originating genes. In addition, it leverages the PPI network and Gene Ontology structure to further coordinate the matrix factorization. Extensive experimental results show that DisoFun improves the area under the receiver operating characteristic curve and area under the precision-recall curve of existing solutions by at least 7.7 and 28.9%, respectively. We further investigate DisoFun on four exemplar genes (LMNA, ADAM15, BCL2L1 and CFLAR) with known functions at the isoform-level, and observed that DisoFun can differentiate functions of their isoforms with 90.5% accuracy.

AVAILABILITY AND IMPLEMENTATION

The code of DisoFun is available at mlda.swu.edu.cn/codes.php?name=DisoFun.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

异构体是基因的选择性剪接 mRNA。它们可以翻译成不同功能的蛋白变体(或蛋白异构体),从而大大增加蛋白质变体(或蛋白异构体)的功能多样性。区分异构体(或蛋白异构体)的功能有助于更深入地了解各种复杂疾病的潜在病理学。由于现有的功能基因组数据库统一记录基因级别的注释,很少记录异构体级别的注释,因此区分异构体的功能比传统的基因级功能预测更具挑战性。

结果

已经提出了几种方法来区分异构体的功能。它们通常遵循多实例学习范例,将每个基因视为一个袋,将剪接异构体视为其实例,并将袋的功能推到实例上。这些方法隐含地假设收集的基因注释是完整的,并且仅整合了多个 RNA-seq 数据集。因此,它们的性能受到了影响。我们提出了一种数据集成解决方案(称为 DisoFun),通过协同矩阵分解来区分异构体的功能。DisoFun 假设基因的功能注释是从关键异构体的注释中聚合而来的。它协同地将异构体数据矩阵和基因术语数据矩阵(存储基因的基因本体注释)分解为低秩矩阵,以同时探索潜在的关键异构体,并通过将预测聚合到其起源基因上来实现功能预测。此外,它利用蛋白质-蛋白质相互作用网络和基因本体结构进一步协调矩阵分解。广泛的实验结果表明,DisoFun 至少将现有解决方案的接收器操作特征曲线下面积和精度-召回曲线下面积提高了 7.7%和 28.9%。我们进一步在四个具有已知异构体功能的典型基因(LMNA、ADAM15、BCL2L1 和 CFLAR)上研究了 DisoFun,并观察到 DisoFun 可以以 90.5%的准确率区分它们的异构体的功能。

可用性和实现

DisoFun 的代码可在 mlda.swu.edu.cn/codes.php?name=DisoFun 上获得。

补充信息

补充数据可在生物信息学在线获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验