用于集群感知精准医疗的简单且可扩展算法

Simple and Scalable Algorithms for Cluster-Aware Precision Medicine.

作者信息

Buch Amanda M, Liston Conor, Grosenick Logan

机构信息

Dept. of Psychiatry & BMRI, Weill Cornell Medicine, Cornell University.

出版信息

Proc Mach Learn Res. 2024 May;238:136-144.

PMID:39015742

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11251711/

Abstract

AI-enabled precision medicine promises a transformational improvement in healthcare outcomes. However, training on biomedical data presents significant challenges as they are often high dimensional, clustered, and of limited sample size. To overcome these challenges, we propose a simple and scalable approach for cluster-aware embedding that combines latent factor methods with a convex clustering penalty in a modular way. Our novel approach overcomes the complexity and limitations of current joint embedding and clustering methods and enables hierarchically clustered principal component analysis (PCA), locally linear embedding (LLE), and canonical correlation analysis (CCA). Through numerical experiments and real-world examples, we demonstrate that our approach outperforms fourteen clustering methods on highly underdetermined problems (e.g., with limited sample size) as well as on large sample datasets. Importantly, our approach does not require the user to choose the desired number of clusters, yields improved model selection if they do, and yields interpretable hierarchically clustered embedding dendrograms. Thus, our approach improves significantly on existing methods for identifying patient subgroups in multiomics and neuroimaging data and enables scalable and interpretable biomarkers for precision medicine.

摘要

人工智能驱动的精准医学有望显著改善医疗保健效果。然而，对生物医学数据进行训练存在重大挑战，因为这些数据通常具有高维度、聚类且样本量有限的特点。为了克服这些挑战，我们提出了一种简单且可扩展的聚类感知嵌入方法，该方法以模块化方式将潜在因子方法与凸聚类惩罚相结合。我们的新方法克服了当前联合嵌入和聚类方法的复杂性和局限性，并实现了分层聚类主成分分析（PCA）、局部线性嵌入（LLE）和典型相关分析（CCA）。通过数值实验和实际案例，我们证明了我们的方法在高度欠定问题（例如样本量有限）以及大样本数据集上优于十四种聚类方法。重要的是，我们的方法不需要用户选择所需的聚类数量，如果用户选择了聚类数量，它能改进模型选择，并生成可解释的分层聚类嵌入树状图。因此，我们的方法在识别多组学和神经影像数据中的患者亚组的现有方法上有显著改进，并能为精准医学提供可扩展且可解释的生物标志物。

相似文献

Simple and Scalable Algorithms for Cluster-Aware Precision Medicine.用于集群感知精准医疗的简单且可扩展算法

Proc Mach Learn Res. 2024 May;238:136-144.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Learning eigenfunctions links spectral embedding and kernel PCA.学习特征函数将谱嵌入与核主成分分析联系起来。

Neural Comput. 2004 Oct;16(10):2197-219. doi: 10.1162/0899766041732396.

Consensus embedding: theory, algorithms and application to segmentation and classification of biomedical data.共识嵌入：理论、算法及其在生物医学数据分割和分类中的应用。

BMC Bioinformatics. 2012 Feb 8;13:26. doi: 10.1186/1471-2105-13-26.

The future of Cochrane Neonatal.考克兰新生儿协作网的未来。

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

Machine-learned cluster identification in high-dimensional data.高维数据中的机器学习聚类识别

J Biomed Inform. 2017 Feb;66:95-104. doi: 10.1016/j.jbi.2016.12.011. Epub 2016 Dec 28.

Information-incorporated sparse convex clustering for disease subtyping.基于信息融合的稀疏凸聚类疾病亚分类方法。

Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad417.

Quality Scalability Aware Watermarking for Visual Content.面向视觉内容的质量可扩展性感知水印技术。

IEEE Trans Image Process. 2016 Nov;25(11):5158-5172. doi: 10.1109/TIP.2016.2599785. Epub 2016 Aug 11.

ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction.ClearF++：在类内嵌入和重构中使用特征聚类改进监督特征评分

Bioengineering (Basel). 2023 Jul 10;10(7):824. doi: 10.3390/bioengineering10070824.

Capturing discrete latent structures: choose LDs over PCs.捕捉离散潜在结构：选择潜在因子而非主成分。

Biostatistics. 2022 Dec 12;24(1):1-16. doi: 10.1093/biostatistics/kxab030.

本文引用的文献

Supervised convex clustering.有监督凸聚类。

Biometrics. 2023 Dec;79(4):3846-3858. doi: 10.1111/biom.13860. Epub 2023 Apr 12.

Molecular and network-level mechanisms explaining individual differences in autism spectrum disorder.解释自闭症谱系障碍个体差异的分子和网络水平机制。

Nat Neurosci. 2023 Apr;26(4):650-663. doi: 10.1038/s41593-023-01259-x. Epub 2023 Mar 9.

Validating a Proteomic Signature of Severe COVID-19.验证重症 COVID-19 的蛋白质组学特征

Crit Care Explor. 2022 Dec 1;4(12):e0800. doi: 10.1097/CCE.0000000000000800. eCollection 2022 Dec.

Activation of the Carboxypeptidase U (CPU, TAFIa, CPB2) System in Patients with SARS-CoV-2 Infection Could Contribute to COVID-19 Hypofibrinolytic State and Disease Severity Prognosis.严重急性呼吸综合征冠状病毒2（SARS-CoV-2）感染患者中羧肽酶U（CPU，TAFIa，CPB2）系统的激活可能导致新型冠状病毒肺炎（COVID-19）的低纤维蛋白溶解状态和疾病严重程度预后。

J Clin Med. 2022 Mar 9;11(6):1494. doi: 10.3390/jcm11061494.

The role of polymorphisms and expression in breast cancer susceptibility and outcome.多态性和表达在乳腺癌易感性及预后中的作用。

Transl Cancer Res. 2020 Oct;9(10):6344-6353. doi: 10.21037/tcr-20-1120.

Inflammation Subtypes and Translating Inflammation-Related Genetic Findings in Schizophrenia and Related Psychoses: A Perspective on Pathways for Treatment Stratification and Novel Therapies.炎症亚型与精神分裂症及相关精神病中炎症相关遗传发现的转化：一种治疗分层和新疗法的途径观点。

Harv Rev Psychiatry. 2022;30(1):59-70. doi: 10.1097/HRP.0000000000000321.

GNN-based embedding for clustering scRNA-seq data.基于图神经网络的 scRNA-seq 数据聚类嵌入方法。

Bioinformatics. 2022 Jan 27;38(4):1037-1044. doi: 10.1093/bioinformatics/btab787.

Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data.混合多视图数据的集成广义凸聚类优化与特征选择

J Mach Learn Res. 2021 Jan;22.

Embeddings of genomic region sets capture rich biological associations in lower dimensions.基因组区域集的嵌入在低维空间中捕获丰富的生物学关联。

Bioinformatics. 2021 Dec 7;37(23):4299-4306. doi: 10.1093/bioinformatics/btab439.

A joint deep learning model enables simultaneous batch effect correction, denoising, and clustering in single-cell transcriptomics.联合深度学习模型可实现单细胞转录组学中批量效应校正、去噪和聚类的同时进行。

Genome Res. 2021 Oct;31(10):1753-1766. doi: 10.1101/gr.271874.120. Epub 2021 May 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验