Suppr超能文献

多项目多谱联合非负矩阵分解在癌症组学数据集上的应用。

Multi-project and Multi-profile joint Non-negative Matrix Factorization for cancer omic datasets.

机构信息

Industrial Engineering Department, University of Los Andes, Bogota 111711, Colombia.

Center for Optimization and Applied Probability, School of Engineering, University of Los Andes, Bogota 111711, Colombia.

出版信息

Bioinformatics. 2021 Dec 11;37(24):4801-4809. doi: 10.1093/bioinformatics/btab579.

Abstract

MOTIVATION

The integration of multi-omic data using machine learning methods has been focused on solving relevant tasks such as predicting sensitivity to a drug or subtyping patients. Recent integration methods, such as joint Non-negative Matrix Factorization, have allowed researchers to exploit the information in the data to unravel the biological processes of multi-omic datasets.

RESULTS

We present a novel method called Multi-project and Multi-profile joint Non-negative Matrix Factorization capable of integrating data from different sources, such as experimental and observational multi-omic data. The method can generate co-clusters between observations, predict profiles and relate latent variables. We applied the method to integrate low-grade glioma omic profiles from The Cancer Genome Atlas (TCGA) and Cancer Cell Line Encyclopedia projects. The method allowed us to find gene clusters mainly enriched in cancer-associated terms. We identified groups of patients and cell lines similar to each other by comparing biological processes. We predicted the drug profile for patients, and we identified genetic signatures for resistant and sensitive tumors to a specific drug.

AVAILABILITY AND IMPLEMENTATION

Source code repository is publicly available at https:/bitbucket.org/dsalazarb/mmjnmf/-Zenodo DOI: 10.5281/zenodo.5150920.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

使用机器学习方法整合多组学数据一直专注于解决相关任务,例如预测对药物的敏感性或对患者进行亚型分类。最近的整合方法,如联合非负矩阵分解,使研究人员能够利用数据中的信息来揭示多组学数据集的生物学过程。

结果

我们提出了一种名为多项目和多谱联合非负矩阵分解的新方法,能够整合来自不同来源的数据,如实验和观察多组学数据。该方法可以在观察之间生成共同聚类,预测谱并关联潜在变量。我们应用该方法整合了来自癌症基因组图谱 (TCGA) 和癌症细胞系百科全书项目的低级别神经胶质瘤组学谱。该方法使我们能够找到主要富集在癌症相关术语中的基因簇。我们通过比较生物学过程来识别彼此相似的患者和细胞系群体。我们预测了患者的药物谱,并确定了对特定药物有耐药性和敏感性的肿瘤的遗传特征。

可用性和实现

源代码存储库可在 https://bitbucket.org/dsalazarb/mmjnmf/-Zenodo 上公开获取,DOI: 10.5281/zenodo.5150920。

补充信息

补充数据可在生物信息学在线获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验