Suppr超能文献

3PNMF-MKL:一种基于非负矩阵分解的多模态数据集成多内核学习方法及其在基因特征检测中的应用。

3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection.

作者信息

Mallik Saurav, Sarkar Anasua, Nath Sagnik, Maulik Ujjwal, Das Supantha, Pati Soumen Kumar, Ghosh Soumadip, Zhao Zhongming

机构信息

Department of Environmental Health, Harvard T H Chan School of public Health, Boston, MA, United States.

Department of Computer Science & Engineering, Jadavpur University, Kolkata, India.

出版信息

Front Genet. 2023 Feb 14;14:1095330. doi: 10.3389/fgene.2023.1095330. eCollection 2023.

Abstract

In this current era, biomedical big data handling is a challenging task. Interestingly, the integration of multi-modal data, followed by significant feature mining (gene signature detection), becomes a daunting task. Remembering this, here, we proposed a novel framework, namely, three-factor penalized, non-negative matrix factorization-based multiple kernel learning with soft margin hinge loss (3PNMF-MKL) for multi-modal data integration, followed by gene signature detection. In brief, limma, employing the empirical Bayes statistics, was initially applied to each individual molecular profile, and the statistically significant features were extracted, which was followed by the three-factor penalized non-negative matrix factorization method used for data/matrix fusion using the reduced feature sets. Multiple kernel learning models with soft margin hinge loss had been deployed to estimate average accuracy scores and the area under the curve (AUC). Gene modules had been identified by the consecutive analysis of average linkage clustering and dynamic tree cut. The best module containing the highest correlation was considered the potential gene signature. We utilized an acute myeloid leukemia cancer dataset from The Cancer Genome Atlas (TCGA) repository containing five molecular profiles. Our algorithm generated a 50-gene signature that achieved a high classification AUC score (viz., 0.827). We explored the functions of signature genes using pathway and Gene Ontology (GO) databases. Our method outperformed the state-of-the-art methods in terms of computing AUC. Furthermore, we included some comparative studies with other related methods to enhance the acceptability of our method. Finally, it can be notified that our algorithm can be applied to any multi-modal dataset for data integration, followed by gene module discovery.

摘要

在当今时代,生物医学大数据处理是一项具有挑战性的任务。有趣的是,多模态数据的整合,随后进行重要的特征挖掘(基因特征检测),成为一项艰巨的任务。牢记这一点,在此我们提出了一种新颖的框架,即基于三因素惩罚、非负矩阵分解的带有软间隔铰链损失的多核学习(3PNMF-MKL),用于多模态数据整合及后续的基因特征检测。简而言之,首先将采用经验贝叶斯统计的limma应用于每个单独的分子谱,提取具有统计学意义的特征,随后使用三因素惩罚非负矩阵分解方法,利用缩减后的特征集进行数据/矩阵融合。已部署带有软间隔铰链损失的多核学习模型来估计平均准确率得分和曲线下面积(AUC)。通过连续分析平均连锁聚类和动态树切割来识别基因模块。包含最高相关性的最佳模块被视为潜在的基因特征。我们使用了来自癌症基因组图谱(TCGA)存储库的急性髓系白血病癌症数据集,其中包含五种分子谱。我们的算法生成了一个50基因特征,其分类AUC得分较高(即0.827)。我们使用通路和基因本体(GO)数据库探索了特征基因的功能。在计算AUC方面,我们的方法优于现有方法。此外,我们纳入了与其他相关方法的一些比较研究,以提高我们方法的可接受性。最后,可以注意到我们的算法可应用于任何多模态数据集进行数据整合及后续的基因模块发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86e/9971618/deec3d1cb10e/fgene-14-1095330-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验