• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

3PNMF-MKL:一种基于非负矩阵分解的多模态数据集成多内核学习方法及其在基因特征检测中的应用。

3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection.

作者信息

Mallik Saurav, Sarkar Anasua, Nath Sagnik, Maulik Ujjwal, Das Supantha, Pati Soumen Kumar, Ghosh Soumadip, Zhao Zhongming

机构信息

Department of Environmental Health, Harvard T H Chan School of public Health, Boston, MA, United States.

Department of Computer Science & Engineering, Jadavpur University, Kolkata, India.

出版信息

Front Genet. 2023 Feb 14;14:1095330. doi: 10.3389/fgene.2023.1095330. eCollection 2023.

DOI:10.3389/fgene.2023.1095330
PMID:36865387
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9971618/
Abstract

In this current era, biomedical big data handling is a challenging task. Interestingly, the integration of multi-modal data, followed by significant feature mining (gene signature detection), becomes a daunting task. Remembering this, here, we proposed a novel framework, namely, three-factor penalized, non-negative matrix factorization-based multiple kernel learning with soft margin hinge loss (3PNMF-MKL) for multi-modal data integration, followed by gene signature detection. In brief, limma, employing the empirical Bayes statistics, was initially applied to each individual molecular profile, and the statistically significant features were extracted, which was followed by the three-factor penalized non-negative matrix factorization method used for data/matrix fusion using the reduced feature sets. Multiple kernel learning models with soft margin hinge loss had been deployed to estimate average accuracy scores and the area under the curve (AUC). Gene modules had been identified by the consecutive analysis of average linkage clustering and dynamic tree cut. The best module containing the highest correlation was considered the potential gene signature. We utilized an acute myeloid leukemia cancer dataset from The Cancer Genome Atlas (TCGA) repository containing five molecular profiles. Our algorithm generated a 50-gene signature that achieved a high classification AUC score (viz., 0.827). We explored the functions of signature genes using pathway and Gene Ontology (GO) databases. Our method outperformed the state-of-the-art methods in terms of computing AUC. Furthermore, we included some comparative studies with other related methods to enhance the acceptability of our method. Finally, it can be notified that our algorithm can be applied to any multi-modal dataset for data integration, followed by gene module discovery.

摘要

在当今时代,生物医学大数据处理是一项具有挑战性的任务。有趣的是,多模态数据的整合,随后进行重要的特征挖掘(基因特征检测),成为一项艰巨的任务。牢记这一点,在此我们提出了一种新颖的框架,即基于三因素惩罚、非负矩阵分解的带有软间隔铰链损失的多核学习(3PNMF-MKL),用于多模态数据整合及后续的基因特征检测。简而言之,首先将采用经验贝叶斯统计的limma应用于每个单独的分子谱,提取具有统计学意义的特征,随后使用三因素惩罚非负矩阵分解方法,利用缩减后的特征集进行数据/矩阵融合。已部署带有软间隔铰链损失的多核学习模型来估计平均准确率得分和曲线下面积(AUC)。通过连续分析平均连锁聚类和动态树切割来识别基因模块。包含最高相关性的最佳模块被视为潜在的基因特征。我们使用了来自癌症基因组图谱(TCGA)存储库的急性髓系白血病癌症数据集,其中包含五种分子谱。我们的算法生成了一个50基因特征,其分类AUC得分较高(即0.827)。我们使用通路和基因本体(GO)数据库探索了特征基因的功能。在计算AUC方面,我们的方法优于现有方法。此外,我们纳入了与其他相关方法的一些比较研究,以提高我们方法的可接受性。最后,可以注意到我们的算法可应用于任何多模态数据集进行数据整合及后续的基因模块发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86e/9971618/3b9bd2144772/fgene-14-1095330-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86e/9971618/deec3d1cb10e/fgene-14-1095330-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86e/9971618/1600e7df5583/fgene-14-1095330-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86e/9971618/451106f7d1dc/fgene-14-1095330-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86e/9971618/3b9bd2144772/fgene-14-1095330-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86e/9971618/deec3d1cb10e/fgene-14-1095330-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86e/9971618/1600e7df5583/fgene-14-1095330-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86e/9971618/451106f7d1dc/fgene-14-1095330-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f86e/9971618/3b9bd2144772/fgene-14-1095330-g004.jpg

相似文献

1
3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection.3PNMF-MKL:一种基于非负矩阵分解的多模态数据集成多内核学习方法及其在基因特征检测中的应用。
Front Genet. 2023 Feb 14;14:1095330. doi: 10.3389/fgene.2023.1095330. eCollection 2023.
2
Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer.比较五种监督特征选择算法,这些算法可从癌症的多组学数据中得到顶级特征和基因特征。
BMC Bioinformatics. 2022 Apr 28;23(Suppl 3):153. doi: 10.1186/s12859-022-04678-y.
3
Soft margin multiple kernel learning.软间隔多内核学习。
IEEE Trans Neural Netw Learn Syst. 2013 May;24(5):749-61. doi: 10.1109/TNNLS.2012.2237183.
4
PathME: pathway based multi-modal sparse autoencoders for clustering of patient-level multi-omics data.PathME:基于通路的多模态稀疏自动编码器,用于对患者层面多组学数据进行聚类。
BMC Bioinformatics. 2020 Apr 16;21(1):146. doi: 10.1186/s12859-020-3465-2.
5
Identification of drug-target interactions via multiple kernel-based triple collaborative matrix factorization.基于多核的三重协作矩阵分解识别药物-靶标相互作用。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab582.
6
Multi-omics data fusion using adaptive GTO guided Non-negative matrix factorization for cancer subtype discovery.使用自适应广义张量正交分解引导的非负矩阵分解进行癌症亚型发现的多组学数据融合
Comput Methods Programs Biomed. 2023 Jan;228:107246. doi: 10.1016/j.cmpb.2022.107246. Epub 2022 Nov 16.
7
JDSNMF: Joint Deep Semi-Non-Negative Matrix Factorization for Learning Integrative Representation of Molecular Signals in Alzheimer's Disease.JDSNMF:用于学习阿尔茨海默病分子信号综合表征的联合深度半非负矩阵分解
J Pers Med. 2021 Jul 21;11(8):686. doi: 10.3390/jpm11080686.
8
Multiple-kernel learning for genomic data mining and prediction.基于多核学习的基因组数据挖掘和预测
BMC Bioinformatics. 2019 Aug 15;20(1):426. doi: 10.1186/s12859-019-2992-1.
9
A Novel Graph Topology-Based GO-Similarity Measure for Signature Detection From Multi-Omics Data and its Application to Other Problems.基于图拓扑的新型 GO 相似性度量在多组学数据特征检测中的应用及其在其他问题中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Mar-Apr;19(2):773-785. doi: 10.1109/TCBB.2020.3020537. Epub 2022 Apr 1.
10
Performance Comparison of Deep Learning Autoencoders for Cancer Subtype Detection Using Multi-Omics Data.使用多组学数据的深度学习自动编码器在癌症亚型检测中的性能比较
Cancers (Basel). 2021 Apr 22;13(9):2013. doi: 10.3390/cancers13092013.

本文引用的文献

1
Detecting COVID-19 infection status from chest X-ray and CT scan single transfer learning-driven approach.基于单迁移学习驱动方法从胸部X光和CT扫描检测新冠病毒感染状态
Front Genet. 2022 Sep 21;13:980338. doi: 10.3389/fgene.2022.980338. eCollection 2022.
2
Explanation-Driven Deep Learning Model for Prediction of Brain Tumour Status Using MRI Image Data.基于MRI图像数据的用于预测脑肿瘤状态的解释驱动深度学习模型
Front Genet. 2022 Mar 14;13:822666. doi: 10.3389/fgene.2022.822666. eCollection 2022.
3
Identification of Serum miRNA Signature and Establishment of a Nomogram for Risk Stratification in Patients With Pancreatic Ductal Adenocarcinoma.
血清 microRNA 特征鉴定及建立胰腺导管腺癌患者风险分层的列线图
Ann Surg. 2022 Jan 1;275(1):e229-e237. doi: 10.1097/SLA.0000000000003945.
4
Identification and validation of a prognostic 8-gene signature for acute myeloid leukemia.鉴定和验证急性髓细胞白血病的预后 8 基因标志物。
Leuk Lymphoma. 2020 Aug;61(8):1981-1988. doi: 10.1080/10428194.2020.1742898. Epub 2020 Apr 8.
5
Multi-omics Data Integration for Identifying Osteoporosis Biomarkers and Their Biological Interaction and Causal Mechanisms.用于识别骨质疏松症生物标志物及其生物学相互作用和因果机制的多组学数据整合
iScience. 2020 Feb 21;23(2):100847. doi: 10.1016/j.isci.2020.100847. Epub 2020 Jan 17.
6
Integration of multi-omics data to mine cancer-related gene modules.整合多组学数据以挖掘癌症相关基因模块。
J Bioinform Comput Biol. 2019 Dec;17(6):1950038. doi: 10.1142/S0219720019500380.
7
Proteomics Is Not an Island: Multi-omics Integration Is the Key to Understanding Biological Systems.蛋白质组学并非孤立存在:多组学整合是理解生物系统的关键。
Mol Cell Proteomics. 2019 Aug 9;18(8 suppl 1):S1-S4. doi: 10.1074/mcp.E119.001693.
8
Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data.基于图和规则的学习算法:使用基因组数据对癌症类型分类和预后的应用的全面综述。
Brief Bioinform. 2020 Mar 23;21(2):368-394. doi: 10.1093/bib/bby120.
9
Biomarker discovery by integrated joint non-negative matrix factorization and pathway signature analyses.基于联合非负矩阵分解和通路特征分析的生物标志物发现
Sci Rep. 2018 Jun 27;8(1):9743. doi: 10.1038/s41598-018-28066-w.
10
Adaptive Multiview Nonnegative Matrix Factorization Algorithm for Integration of Multimodal Biomedical Data.用于多模态生物医学数据整合的自适应多视图非负矩阵分解算法
Cancer Inform. 2017 Aug 18;16:1176935117725727. doi: 10.1177/1176935117725727. eCollection 2017.