GRACKLE：一种用于生物医学表示学习的可解释矩阵分解方法。

GRACKLE: an interpretable matrix factorization approach for biomedical representation learning.

作者信息

Gillenwater Lucas A, Hunter Lawrence E, Costello James C

机构信息

Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, United States.

Computational Bioscience Program, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, United States.

出版信息

Bioinformatics. 2025 Jul 1;41(Supplement_1):i609-i618. doi: 10.1093/bioinformatics/btaf213.

DOI:10.1093/bioinformatics/btaf213

PMID:40662804

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12261436/

Abstract

MOTIVATION

Disruption in normal gene expression can contribute to the development of diseases and chronic conditions. However, identifying disease-specific gene signatures can be challenging due to the presence of multiple co-occurring conditions and limited sample sizes. Unsupervised representation learning methods, such as matrix decomposition and deep learning, simplify high-dimensional data into understandable patterns, but often do not provide clear biological explanations. Incorporating prior biological knowledge directly can enhance understanding and address small sample sizes. Nevertheless, current models do not jointly consider prior knowledge of molecular interactions and sample labels.

RESULTS

We present GRACKLE, a novel nonnegative matrix factorization approach that applies Graph Regularization Across Contextual KnowLedgE. GRACKLE integrates sample similarity and gene similarity matrices based on sample metadata and molecular relationships, respectively. Simulation studies show GRACKLE outperformed other NMF algorithms, especially with increased background noise. GRACKLE effectively stratified breast tumor samples and identified condition-enriched subgroups in individuals with Down syndrome. The model's latent representations aligned with known biological patterns, such as autoimmune conditions and sleep apnea in Down syndrome. GRACKLE's flexibility allows application to various data modalities, offering a robust solution for identifying context-specific molecular mechanisms in biomedical research.

AVAILABILITY AND IMPLEMENTATION

GRACKLE is available at: https://github.com/lagillenwater/GRACKLE.

摘要

动机

正常基因表达的破坏可能导致疾病和慢性病的发展。然而，由于存在多种并发疾病和样本量有限，识别疾病特异性基因特征可能具有挑战性。无监督表示学习方法，如矩阵分解和深度学习，将高维数据简化为可理解的模式，但通常不提供清晰的生物学解释。直接纳入先验生物学知识可以增强理解并解决小样本量问题。然而，当前模型并未联合考虑分子相互作用的先验知识和样本标签。

结果

我们提出了GRACKLE，一种新颖的非负矩阵分解方法，它应用跨上下文知识的图正则化。GRACKLE分别基于样本元数据和分子关系整合样本相似性和基因相似性矩阵。模拟研究表明，GRACKLE优于其他非负矩阵分解算法，尤其是在背景噪声增加的情况下。GRACKLE有效地对乳腺肿瘤样本进行了分层，并在唐氏综合征患者中识别出条件富集亚组。该模型的潜在表示与已知的生物学模式一致，如唐氏综合征中的自身免疫性疾病和睡眠呼吸暂停。GRACKLE的灵活性允许应用于各种数据模式，为在生物医学研究中识别特定于上下文的分子机制提供了一个强大的解决方案。