Suppr超能文献

基于多核学习的组学数据集综合共识聚类分析。

Multiple kernel learning for integrative consensus clustering of omic datasets.

机构信息

MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 0SR, UK.

Cambridge Institute of Therapeutic Immunology & Infectious Disease, University of Cambridge, Cambridge CB2 0AW, UK.

出版信息

Bioinformatics. 2020 Sep 15;36(18):4789-4796. doi: 10.1093/bioinformatics/btaa593.

Abstract

MOTIVATION

Diverse applications-particularly in tumour subtyping-have demonstrated the importance of integrative clustering techniques for combining information from multiple data sources. Cluster Of Clusters Analysis (COCA) is one such approach that has been widely applied in the context of tumour subtyping. However, the properties of COCA have never been systematically explored, and its robustness to the inclusion of noisy datasets is unclear.

RESULTS

We rigorously benchmark COCA, and present Kernel Learning Integrative Clustering (KLIC) as an alternative strategy. KLIC frames the challenge of combining clustering structures as a multiple kernel learning problem, in which different datasets each provide a weighted contribution to the final clustering. This allows the contribution of noisy datasets to be down-weighted relative to more informative datasets. We compare the performances of KLIC and COCA in a variety of situations through simulation studies. We also present the output of KLIC and COCA in real data applications to cancer subtyping and transcriptional module discovery.

AVAILABILITY AND IMPLEMENTATION

R packages klic and coca are available on the Comprehensive R Archive Network.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

多种应用——特别是在肿瘤分型方面——已经证明了整合聚类技术对于结合来自多个数据源的信息的重要性。聚类簇分析(COCA)就是这样一种方法,它在肿瘤分型方面得到了广泛的应用。然而,COCA 的性质从未被系统地探索过,其对包含噪声数据集的稳健性也不清楚。

结果

我们严格地对 COCA 进行基准测试,并提出了核学习集成聚类(KLIC)作为替代策略。KLIC 将组合聚类结构的挑战表述为一个多核学习问题,其中不同的数据集各自对最终聚类提供加权贡献。这使得噪声数据集的贡献相对于更具信息量的数据集被降低权重。我们通过模拟研究比较了 KLIC 和 COCA 在各种情况下的性能。我们还在癌症分型和转录模块发现的真实数据应用中展示了 KLIC 和 COCA 的输出。

可用性和实现

R 包 klic 和 coca 可在 Comprehensive R Archive Network 上获得。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cc9/7750932/24dd8aafd5ab/btaa593f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验