Suppr超能文献

联邦无监督随机森林用于保护隐私的患者分层。

Federated unsupervised random forest for privacy-preserving patient stratification.

机构信息

Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, 8010, Austria.

Department of Pure and Applied Sciences, University of Urbino, Urbino, 61029, Italy.

出版信息

Bioinformatics. 2024 Sep 1;40(Suppl 2):ii198-ii207. doi: 10.1093/bioinformatics/btae382.

Abstract

MOTIVATION

In the realm of precision medicine, effective patient stratification and disease subtyping demand innovative methodologies tailored for multi-omics data. Clustering techniques applied to multi-omics data have become instrumental in identifying distinct subgroups of patients, enabling a finer-grained understanding of disease variability. Meanwhile, clinical datasets are often small and must be aggregated from multiple hospitals. Online data sharing, however, is seen as a significant challenge due to privacy concerns, potentially impeding big data's role in medical advancements using machine learning. This work establishes a powerful framework for advancing precision medicine through unsupervised random forest-based clustering in combination with federated computing.

RESULTS

We introduce a novel multi-omics clustering approach utilizing unsupervised random forests. The unsupervised nature of the random forest enables the determination of cluster-specific feature importance, unraveling key molecular contributors to distinct patient groups. Our methodology is designed for federated execution, a crucial aspect in the medical domain where privacy concerns are paramount. We have validated our approach on machine learning benchmark datasets as well as on cancer data from The Cancer Genome Atlas. Our method is competitive with the state-of-the-art in terms of disease subtyping, but at the same time substantially improves the cluster interpretability. Experiments indicate that local clustering performance can be improved through federated computing.

AVAILABILITY AND IMPLEMENTATION

The proposed methods are available as an R-package (https://github.com/pievos101/uRF).

摘要

动机

在精准医学领域,有效的患者分层和疾病亚型划分需要针对多组学数据量身定制的创新方法。应用于多组学数据的聚类技术在识别不同的患者亚群方面发挥了重要作用,使人们能够更深入地了解疾病的变异性。然而,临床数据集通常较小,并且必须从多个医院进行汇总。然而,由于隐私问题,在线数据共享被视为一个重大挑战,这可能会阻碍使用机器学习的大数据在医学进步中的作用。本研究通过基于无监督随机森林的聚类与联邦计算相结合,为推进精准医学建立了一个强大的框架。

结果

我们提出了一种新颖的多组学聚类方法,该方法利用无监督随机森林。随机森林的无监督性质能够确定聚类特定的特征重要性,揭示导致不同患者群体的关键分子贡献者。我们的方法是为联邦执行而设计的,这在隐私问题至关重要的医疗领域是一个关键方面。我们已经在机器学习基准数据集以及来自癌症基因组图谱的癌症数据上验证了我们的方法。我们的方法在疾病亚型划分方面具有竞争力,但同时大大提高了聚类的可解释性。实验表明,通过联邦计算可以提高本地聚类性能。

可用性和实施

所提出的方法作为 R 包提供(https://github.com/pievos101/uRF)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9697/11373406/7770f452216f/btae382f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验