通过联合矩阵三因子分解实现患者相似性以识别急性髓系白血病亚组

Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia.

作者信息

Vitali F, Marini S, Pala D, Demartini A, Montoli S, Zambelli A, Bellazzi R

机构信息

Center for Biomedical Informatics and Biostatistics, The University of Arizona, Tucson, Arizona, USA.

BIO5 Institute, The University of Arizona, Tucson, Arizona, USA.

出版信息

JAMIA Open. 2018 May 14;1(1):75-86. doi: 10.1093/jamiaopen/ooy008. eCollection 2018 Jul.

DOI:10.1093/jamiaopen/ooy008

PMID:31984320

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6951984/

Abstract

OBJECTIVE

Computing patients' similarity is of great interest in precision oncology since it supports clustering and subgroup identification, eventually leading to tailored therapies. The availability of large amounts of biomedical data, characterized by large feature sets and sparse content, motivates the development of new methods to compute patient similarities able to fuse heterogeneous data sources with the available knowledge.

MATERIALS AND METHODS

In this work, we developed a data integration approach based on matrix trifactorization to compute patient similarities by integrating several sources of data and knowledge. We assess the accuracy of the proposed method: (1) on several synthetic data sets which similarity structures are affected by increasing levels of noise and data sparsity, and (2) on a real data set coming from an acute myeloid leukemia (AML) study. The results obtained are finally compared with the ones of traditional similarity calculation methods.

RESULTS

In the analysis of the synthetic data set, where the ground truth is known, we measured the capability of reconstructing the correct clusters, while in the AML study we evaluated the Kaplan-Meier curves obtained with the different clusters and measured their statistical difference by means of the log-rank test. In presence of noise and sparse data, our data integration method outperform other techniques, both in the synthetic and in the AML data.

DISCUSSION

In case of multiple heterogeneous data sources, a matrix trifactorization technique can successfully fuse all the information in a joint model. We demonstrated how this approach can be efficiently applied to discover meaningful patient similarities and therefore may be considered a reliable data driven strategy for the definition of new research hypothesis for precision oncology.

CONCLUSION

The better performance of the proposed approach presents an advantage over previous methods to provide accurate patient similarities supporting precision medicine.

摘要

目的

计算患者相似度在精准肿瘤学中具有重要意义，因为它有助于聚类和亚组识别，最终实现个性化治疗。大量生物医学数据的存在，其特点是特征集大且内容稀疏，这推动了新方法的开发，以计算能够融合异构数据源和现有知识的患者相似度。

材料与方法

在这项工作中，我们开发了一种基于矩阵三分解的数据集成方法，通过整合多种数据和知识来源来计算患者相似度。我们评估了所提出方法的准确性：（1）在几个相似度结构受噪声和数据稀疏程度增加影响的合成数据集上，以及（2）在一个来自急性髓系白血病（AML）研究的真实数据集上。最后将获得的结果与传统相似度计算方法的结果进行比较。

结果

在已知真实情况的合成数据集分析中，我们测量了重建正确聚类的能力，而在AML研究中，我们评估了用不同聚类获得的Kaplan-Meier曲线，并通过对数秩检验测量它们的统计差异。在存在噪声和稀疏数据的情况下，我们的数据集成方法在合成数据和AML数据中均优于其他技术。

讨论

在存在多个异构数据源的情况下，矩阵三分解技术可以成功地将所有信息融合到一个联合模型中。我们展示了这种方法如何能够有效地应用于发现有意义的患者相似度，因此可以被认为是一种可靠的数据驱动策略，用于为精准肿瘤学定义新的研究假设。

结论

所提出方法的更好性能相对于以前的方法具有优势，能够提供准确的患者相似度以支持精准医学。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d72d/6951984/c421a389cb50/ooy008f1.jpg

相似文献

JAMIA Open. 2018 May 14;1(1):75-86. doi: 10.1093/jamiaopen/ooy008. eCollection 2018 Jul.

A Novel 85-Gene Expression Signature Predicts Unfavorable Prognosis in Acute Myeloid Leukemia.一种新型的 85 基因表达谱可预测急性髓系白血病的不良预后。

Technol Cancer Res Treat. 2021 Jan-Dec;20:15330338211004933. doi: 10.1177/15330338211004933.

Computing Drug-Drug Similarity from Patient-Centric Data.从以患者为中心的数据中计算药物-药物相似性。

Bioengineering (Basel). 2023 Feb 1;10(2):182. doi: 10.3390/bioengineering10020182.

Identifying a novel 5-gene signature predicting clinical outcomes in acute myeloid leukemia.鉴定预测急性髓系白血病临床结局的新型 5 基因标志物。

Clin Transl Oncol. 2021 Mar;23(3):648-656. doi: 10.1007/s12094-020-02460-1. Epub 2020 Aug 10.

Clustering Sparse Data With Feature Correlation With Application to Discover Subtypes in Cancer.利用特征相关性对稀疏数据进行聚类及其在癌症亚型发现中的应用

IEEE Access. 2020;8:67775-67789. doi: 10.1109/access.2020.2982569. Epub 2020 Mar 26.

Personalizing Chinese medicine by integrating molecular features of diseases and herb ingredient information: application to acute myeloid leukemia.通过整合疾病的分子特征和草药成分信息实现中医个性化：在急性髓系白血病中的应用

Oncotarget. 2017 Jun 27;8(26):43579-43591. doi: 10.18632/oncotarget.16983.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages.一种用于在社交媒体消息中发现健康相关知识的集成异构分类方法。

J Biomed Inform. 2014 Jun;49:255-68. doi: 10.1016/j.jbi.2014.03.005. Epub 2014 Mar 16.

Consolidation therapy for adult acute myeloid leukemia: a systematic analysis according to evidence based medicine.成人急性髓系白血病的巩固治疗：基于循证医学的系统分析

Leuk Lymphoma. 2006 Jun;47(6):1091-102. doi: 10.1080/10428190500513595.

GO functional similarity clustering depends on similarity measure, clustering method, and annotation completeness.GO 功能相似性聚类取决于相似性度量、聚类方法和注释完整性。

BMC Bioinformatics. 2019 Mar 27;20(1):155. doi: 10.1186/s12859-019-2752-2.

引用本文的文献

Uncovering the Understanding of the Concept of Patient Similarity in Cancer Research and Treatment: Scoping Review.揭示癌症研究与治疗中患者相似性概念的理解：范围综述

J Med Internet Res. 2025 Aug 18;27:e71906. doi: 10.2196/71906.

miss-SNF: a multimodal patient similarity network integration approach to handle completely missing data sources.缺失值-SNF：一种用于处理完全缺失数据源的多模态患者相似性网络集成方法。

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf150.

Simplicity within biological complexity.生物复杂性中的简单性。

Bioinform Adv. 2025 Feb 6;5(1):vbae164. doi: 10.1093/bioadv/vbae164. eCollection 2025.

Heterogeneous data integration methods for patient similarity networks.用于患者相似网络的异质数据集成方法。

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac207.

AMR-meta: a k-mer and metafeature approach to classify antimicrobial resistance from high-throughput short-read metagenomics data.AMR-meta：一种基于 k-mer 和元特征的方法，用于从高通量短读宏基因组数据中分类抗生素耐药性。

Gigascience. 2022 May 18;11. doi: 10.1093/gigascience/giac029.

Performance Assessment of the Network Reconstruction Approaches on Various Interactomes.网络重建方法在各种相互作用组上的性能评估

Front Mol Biosci. 2021 Oct 5;8:666705. doi: 10.3389/fmolb.2021.666705. eCollection 2021.

Using Domain Knowledge and Data-Driven Insights for Patient Similarity Analytics.利用领域知识和数据驱动的见解进行患者相似性分析。

J Pers Med. 2021 Jul 22;11(8):699. doi: 10.3390/jpm11080699.

Linear functional organization of the omic embedding space.线性功能组织的组学嵌入空间。

Bioinformatics. 2021 Nov 5;37(21):3839-3847. doi: 10.1093/bioinformatics/btab487.

Fast optimization of non-negative matrix tri-factorization.快速优化非负矩阵三因子分解。

PLoS One. 2019 Jun 11;14(6):e0217994. doi: 10.1371/journal.pone.0217994. eCollection 2019.

Towards a data-integrated cell.迈向数据整合细胞。

Nat Commun. 2019 Feb 18;10(1):805. doi: 10.1038/s41467-019-08797-8.

本文引用的文献

Front Physiol. 2016 Nov 24;7:561. doi: 10.3389/fphys.2016.00561. eCollection 2016.

DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants.DisGeNET：一个整合人类疾病相关基因和变异信息的综合平台。

Nucleic Acids Res. 2017 Jan 4;45(D1):D833-D839. doi: 10.1093/nar/gkw943. Epub 2016 Oct 19.

A Network-Based Data Integration Approach to Support Drug Repurposing and Multi-Target Therapies in Triple Negative Breast Cancer.一种基于网络的数据整合方法，用于支持三阴性乳腺癌中的药物再利用和多靶点治疗。

PLoS One. 2016 Sep 15;11(9):e0162407. doi: 10.1371/journal.pone.0162407. eCollection 2016.

Using concept hierarchies to improve calculation of patient similarity.使用概念层次结构来改进患者相似度的计算。

J Biomed Inform. 2016 Oct;63:66-73. doi: 10.1016/j.jbi.2016.07.021. Epub 2016 Jul 28.

Big data and computational biology strategy for personalized prognosis.个性化预后的大数据与计算生物学策略

Oncotarget. 2016 Jun 28;7(26):40200-40220. doi: 10.18632/oncotarget.9571.

Identifying Cancer Subtypes from miRNA-TF-mRNA Regulatory Networks and Expression Data.从miRNA-TF-mRNA调控网络和表达数据中识别癌症亚型

PLoS One. 2016 Apr 1;11(4):e0152792. doi: 10.1371/journal.pone.0152792. eCollection 2016.

CoINcIDE: A framework for discovery of patient subtypes across multiple datasets.CoINcIDE：一个用于跨多个数据集发现患者亚型的框架。

Genome Med. 2016 Mar 9;8(1):27. doi: 10.1186/s13073-016-0281-4.

PATIENT-SPECIFIC DATA FUSION FOR CANCER STRATIFICATION AND PERSONALISED TREATMENT.用于癌症分层和个性化治疗的患者特异性数据融合

Pac Symp Biocomput. 2016;21:321-32.

Characteristics of Exceptional or Super Responders to Cancer Drugs.癌症药物的卓越或超强应答者的特征。

Mayo Clin Proc. 2015 Dec;90(12):1639-49. doi: 10.1016/j.mayocp.2015.08.017. Epub 2015 Nov 3.

Patient-centric trials for therapeutic development in precision oncology.以患者为中心的精准肿瘤治疗学临床试验。

Nature. 2015 Oct 15;526(7573):361-70. doi: 10.1038/nature15819.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过联合矩阵三因子分解实现患者相似性以识别急性髓系白血病亚组

Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料与方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献