Suppr超能文献

用于集群感知精准医疗的简单且可扩展算法

Simple and Scalable Algorithms for Cluster-Aware Precision Medicine.

作者信息

Buch Amanda M, Liston Conor, Grosenick Logan

机构信息

Dept. of Psychiatry & BMRI, Weill Cornell Medicine, Cornell University.

出版信息

Proc Mach Learn Res. 2024 May;238:136-144.

Abstract

AI-enabled precision medicine promises a transformational improvement in healthcare outcomes. However, training on biomedical data presents significant challenges as they are often high dimensional, clustered, and of limited sample size. To overcome these challenges, we propose a simple and scalable approach for cluster-aware embedding that combines latent factor methods with a convex clustering penalty in a modular way. Our novel approach overcomes the complexity and limitations of current joint embedding and clustering methods and enables hierarchically clustered principal component analysis (PCA), locally linear embedding (LLE), and canonical correlation analysis (CCA). Through numerical experiments and real-world examples, we demonstrate that our approach outperforms fourteen clustering methods on highly underdetermined problems (e.g., with limited sample size) as well as on large sample datasets. Importantly, our approach does not require the user to choose the desired number of clusters, yields improved model selection if they do, and yields interpretable hierarchically clustered embedding dendrograms. Thus, our approach improves significantly on existing methods for identifying patient subgroups in multiomics and neuroimaging data and enables scalable and interpretable biomarkers for precision medicine.

摘要

人工智能驱动的精准医学有望显著改善医疗保健效果。然而,对生物医学数据进行训练存在重大挑战,因为这些数据通常具有高维度、聚类且样本量有限的特点。为了克服这些挑战,我们提出了一种简单且可扩展的聚类感知嵌入方法,该方法以模块化方式将潜在因子方法与凸聚类惩罚相结合。我们的新方法克服了当前联合嵌入和聚类方法的复杂性和局限性,并实现了分层聚类主成分分析(PCA)、局部线性嵌入(LLE)和典型相关分析(CCA)。通过数值实验和实际案例,我们证明了我们的方法在高度欠定问题(例如样本量有限)以及大样本数据集上优于十四种聚类方法。重要的是,我们的方法不需要用户选择所需的聚类数量,如果用户选择了聚类数量,它能改进模型选择,并生成可解释的分层聚类嵌入树状图。因此,我们的方法在识别多组学和神经影像数据中的患者亚组的现有方法上有显著改进,并能为精准医学提供可扩展且可解释的生物标志物。

相似文献

5
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
6
Machine-learned cluster identification in high-dimensional data.高维数据中的机器学习聚类识别
J Biomed Inform. 2017 Feb;66:95-104. doi: 10.1016/j.jbi.2016.12.011. Epub 2016 Dec 28.
8
Quality Scalability Aware Watermarking for Visual Content.面向视觉内容的质量可扩展性感知水印技术。
IEEE Trans Image Process. 2016 Nov;25(11):5158-5172. doi: 10.1109/TIP.2016.2599785. Epub 2016 Aug 11.

本文引用的文献

1
Supervised convex clustering.有监督凸聚类。
Biometrics. 2023 Dec;79(4):3846-3858. doi: 10.1111/biom.13860. Epub 2023 Apr 12.
3
Validating a Proteomic Signature of Severe COVID-19.验证重症 COVID-19 的蛋白质组学特征
Crit Care Explor. 2022 Dec 1;4(12):e0800. doi: 10.1097/CCE.0000000000000800. eCollection 2022 Dec.
7
GNN-based embedding for clustering scRNA-seq data.基于图神经网络的 scRNA-seq 数据聚类嵌入方法。
Bioinformatics. 2022 Jan 27;38(4):1037-1044. doi: 10.1093/bioinformatics/btab787.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验