Industrial Engineering Department, University of Los Andes, Bogota 111711, Colombia.
Center for Optimization and Applied Probability, School of Engineering, University of Los Andes, Bogota 111711, Colombia.
Bioinformatics. 2021 Dec 11;37(24):4801-4809. doi: 10.1093/bioinformatics/btab579.
The integration of multi-omic data using machine learning methods has been focused on solving relevant tasks such as predicting sensitivity to a drug or subtyping patients. Recent integration methods, such as joint Non-negative Matrix Factorization, have allowed researchers to exploit the information in the data to unravel the biological processes of multi-omic datasets.
We present a novel method called Multi-project and Multi-profile joint Non-negative Matrix Factorization capable of integrating data from different sources, such as experimental and observational multi-omic data. The method can generate co-clusters between observations, predict profiles and relate latent variables. We applied the method to integrate low-grade glioma omic profiles from The Cancer Genome Atlas (TCGA) and Cancer Cell Line Encyclopedia projects. The method allowed us to find gene clusters mainly enriched in cancer-associated terms. We identified groups of patients and cell lines similar to each other by comparing biological processes. We predicted the drug profile for patients, and we identified genetic signatures for resistant and sensitive tumors to a specific drug.
Source code repository is publicly available at https:/bitbucket.org/dsalazarb/mmjnmf/-Zenodo DOI: 10.5281/zenodo.5150920.
Supplementary data are available at Bioinformatics online.
使用机器学习方法整合多组学数据一直专注于解决相关任务,例如预测对药物的敏感性或对患者进行亚型分类。最近的整合方法,如联合非负矩阵分解,使研究人员能够利用数据中的信息来揭示多组学数据集的生物学过程。
我们提出了一种名为多项目和多谱联合非负矩阵分解的新方法,能够整合来自不同来源的数据,如实验和观察多组学数据。该方法可以在观察之间生成共同聚类,预测谱并关联潜在变量。我们应用该方法整合了来自癌症基因组图谱 (TCGA) 和癌症细胞系百科全书项目的低级别神经胶质瘤组学谱。该方法使我们能够找到主要富集在癌症相关术语中的基因簇。我们通过比较生物学过程来识别彼此相似的患者和细胞系群体。我们预测了患者的药物谱,并确定了对特定药物有耐药性和敏感性的肿瘤的遗传特征。
源代码存储库可在 https://bitbucket.org/dsalazarb/mmjnmf/-Zenodo 上公开获取,DOI: 10.5281/zenodo.5150920。
补充数据可在生物信息学在线获得。