Liu Yiwei, Li Hong-Dong, Wang Jianxin
School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China.
Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China.
Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae742.
Isoforms spliced from the same gene may carry distinct biological functions. Therefore, annotating functions at the isoform level provides valuable insights into the functional diversity of genomes. Since experimental approaches for determining isoform functions are time- and cost-demanding, computational methods have been proposed. In this case, multi-omics data integration helps enhance the model performance, providing complementary insights for isoform functions. However, current methods underperform in leveraging diverse omics data, primarily due to the limited power to integrate the heterogeneous feature domains. Besides, among the multi-omics data, isoform-isoform interactions (IIIs) are a key data source, as isoforms interact with each other to perform functions. Unfortunately, IIIs remain largely underutilized in isoform function predictions until now.
We introduce CrossIsoFun, a multi-omics data analysis framework for isoform function prediction. CrossIsoFun combines omics-specific and cross-omics learning for data integration and function prediction. In detail, CrossIsoFun uses a graph convolutional network (GCN) as the omics-specific classifier for each data source. The initial label predictions from GCNs are forwarded to the View Correlation Discovery Network (VCDN) and processed as a cross-omics integrative representation. The representation is then used to produce final predictions of isoform functions. In addition, an antoencoder within a cycle-consistency generative adversarial network (cycleGAN) is designed to generate IIIs from PPIs and thereby enrich the interactomics data. Our method outperforms the state-of-the-art methods on three tissue-naive datasets and 15 tissue-specific datasets with mRNA expression, sequence, and PPI data. The prediction of CrossIsoFun is further validated by its consistency with subcellular localization and isoform-level annotations with literature support.
CrossIsoFun is freely available at https://github.com/genemine/CrossIsoFun.
从同一基因剪接而来的异构体可能具有不同的生物学功能。因此,在异构体水平上注释功能有助于深入了解基因组的功能多样性。由于确定异构体功能的实验方法既耗时又昂贵,因此人们提出了计算方法。在这种情况下,多组学数据整合有助于提高模型性能,为异构体功能提供互补的见解。然而,目前的方法在利用多样的组学数据方面表现不佳,主要原因是整合异构特征域的能力有限。此外,在多组学数据中,异构体-异构体相互作用(III)是关键的数据源,因为异构体相互作用以执行功能。不幸的是,到目前为止,III在异构体功能预测中仍未得到充分利用。
我们介绍了CrossIsoFun,这是一个用于异构体功能预测的多组学数据分析框架。CrossIsoFun结合了特定组学和跨组学学习进行数据整合和功能预测。具体来说,CrossIsoFun使用图卷积网络(GCN)作为每个数据源的特定组学分类器。来自GCN的初始标签预测被转发到视图相关性发现网络(VCDN),并作为跨组学综合表示进行处理。然后,该表示用于生成异构体功能的最终预测。此外,循环一致性生成对抗网络(cycleGAN)中的自编码器被设计用于从蛋白质-蛋白质相互作用(PPI)中生成III,从而丰富相互作用组学数据。我们的方法在三个无组织特异性的数据集和15个具有mRNA表达、序列和PPI数据组织特异性数据集上优于现有方法。CrossIsoFun的预测通过与亚细胞定位的一致性以及文献支持的异构体水平注释进一步得到验证。
CrossIsoFun可在https://github.com/genemine/CrossIsoFun上免费获得。