VSClust：基于特征的组学数据方差敏感聚类。

VSClust: feature-based variance-sensitive clustering of omics data.

机构信息

Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense M, Denmark.

VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Odense M, Denmark.

出版信息

Bioinformatics. 2018 Sep 1;34(17):2965-2972. doi: 10.1093/bioinformatics/bty224.

DOI:10.1093/bioinformatics/bty224

PMID:29635359

Abstract

MOTIVATION

Data clustering is indispensable for identifying biologically relevant molecular features in large-scale omics experiments with thousands of measurements at multiple conditions. Optimal clustering results yield groups of functionally related features that may include genes, proteins and metabolites in biological processes and molecular networks. Omics experiments typically include replicated measurements of each feature within a given condition to statistically assess feature-specific variation. Current clustering approaches ignore this variation by averaging, which often leads to incorrect cluster assignments.

RESULTS

We present VSClust that accounts for feature-specific variance. Based on an algorithm derived from fuzzy clustering, VSClust unifies statistical testing with pattern recognition to cluster the data into feature groups that more accurately reflect the underlying molecular and functional behavior. We apply VSClust to artificial and experimental datasets comprising hundreds to >80 000 features across 6-20 different conditions including genomics, transcriptomics, proteomics and metabolomics experiments. VSClust avoids arbitrary averaging methods, outperforms standard fuzzy c-means clustering and simplifies the data analysis workflow in large-scale omics studies.

AVAILABILITY AND IMPLEMENTATION

Download VSClust at https://bitbucket.org/veitveit/vsclust or access it through computproteomics.bmb.sdu.dk/Apps/VSClust.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在大规模组学实验中，需要对数千个在多种条件下的测量数据进行聚类分析，以识别出具有生物学意义的分子特征。最优的聚类结果可生成功能相关的特征组，其中可能包括生物学过程和分子网络中的基因、蛋白质和代谢物。组学实验通常包括在给定条件下对每个特征进行重复测量，以统计评估特征特定的变化。当前的聚类方法通过平均化来忽略这种变化，这往往会导致不正确的聚类分配。

结果

我们提出了 VSClust，它可以考虑特征特定的方差。基于一种源自模糊聚类的算法，VSClust 将统计检验与模式识别统一起来，将数据聚类为特征组，这些特征组更准确地反映了潜在的分子和功能行为。我们将 VSClust 应用于人工和实验数据集，这些数据集包含数百到>80000 个特征，分布在 6-20 个不同的条件下，包括基因组学、转录组学、蛋白质组学和代谢组学实验。VSClust 避免了任意的平均方法，优于标准的模糊 c-均值聚类，并简化了大规模组学研究中的数据分析工作流程。

可用性和实现

可在 https://bitbucket.org/veitveit/vsclust 下载 VSClust 或通过 computproteomics.bmb.sdu.dk/Apps/VSClust 访问它。

补充信息

补充数据可在 Bioinformatics 在线获取。

相似文献

VSClust: feature-based variance-sensitive clustering of omics data.VSClust：基于特征的组学数据方差敏感聚类。

Bioinformatics. 2018 Sep 1;34(17):2965-2972. doi: 10.1093/bioinformatics/bty224.

MetaboLink: A web application for Streamlined Processing and Analysis of Large-Scale Untargeted Metabolomics Data.MetaboLink：一个用于大规模非靶向代谢组学数据简化处理与分析的网络应用程序。

Bioinformatics. 2024 Jul 17;40(7). doi: 10.1093/bioinformatics/btae459.

A Tutorial for Variance-Sensitive Clustering and the Quantitative Analysis of Protein Complexes.变异性敏感聚类及蛋白质复合物定量分析教程。

Methods Mol Biol. 2021;2228:433-451. doi: 10.1007/978-1-0716-1024-4_30.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学：基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍

scMNMF: a novel method for single-cell multi-omics clustering based on matrix factorization.scMNMF：一种基于矩阵分解的单细胞多组学聚类新方法。

Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae228.

RepExplore: addressing technical replicate variance in proteomics and metabolomics data analysis.RepExplore：解决蛋白质组学和代谢组学数据分析中的技术重复差异问题。

Bioinformatics. 2015 Jul 1;31(13):2235-7. doi: 10.1093/bioinformatics/btv127. Epub 2015 Feb 25.

VIQoR: a web service for visually supervised protein inference and protein quantification.VIQoR：一个用于视觉监督蛋白质推断和蛋白质定量的网络服务。

Bioinformatics. 2022 May 13;38(10):2757-2764. doi: 10.1093/bioinformatics/btac182.

NEMO: cancer subtyping by integration of partial multi-omic data.NEMO：通过整合部分多组学数据进行癌症亚型分类。

Bioinformatics. 2019 Sep 15;35(18):3348-3356. doi: 10.1093/bioinformatics/btz058.

DiviK: divisive intelligent K-means for hands-free unsupervised clustering in big biological data.DiviK：用于生物大数据无监督聚类的可分离智能 K 均值算法。

BMC Bioinformatics. 2022 Dec 12;23(1):538. doi: 10.1186/s12859-022-05093-z.

Multiview: a software package for multiview pattern recognition methods.多视图：用于多视图模式识别方法的软件包。

Bioinformatics. 2019 Aug 15;35(16):2877-2879. doi: 10.1093/bioinformatics/bty1039.

引用本文的文献

Circadian ontogenetic metabolomics atlas: an interactive resource with insights from rat plasma, tissues, and feces.昼夜节律个体发生代谢组学图谱：一个包含大鼠血浆、组织和粪便见解的交互式资源。

Cell Mol Life Sci. 2025 Jun 28;82(1):264. doi: 10.1007/s00018-025-05783-w.

Mapping the Interactome of OSCC Prognostic-Associated Proteins NDRG1 and PGK1 Through Proximity Labeling Using TurboID.通过使用TurboID进行邻近标记绘制口腔鳞状细胞癌预后相关蛋白NDRG1和PGK1的相互作用组图谱。

J Proteome Res. 2025 Jun 6;24(6):2741-2756. doi: 10.1021/acs.jproteome.4c01039. Epub 2025 Apr 30.

Post-testicular spermatozoa of a marine teleost can conduct cytoplasmic and mitochondrial translation.一种海洋硬骨鱼的睾丸后精子能够进行细胞质和线粒体翻译。

iScience. 2024 Dec 6;28(1):111537. doi: 10.1016/j.isci.2024.111537. eCollection 2025 Jan 17.

Bioinformatics. 2024 Jul 17;40(7). doi: 10.1093/bioinformatics/btae459.

Identification of structural and regulatory cell-shape determinants in Haloferax volcanii.鉴定火球菌中结构和调控细胞形状的决定因素。

Nat Commun. 2024 Feb 15;15(1):1414. doi: 10.1038/s41467-024-45196-0.

Dissecting tumor microenvironment heterogeneity in syngeneic mouse models: insights on cancer-associated fibroblast phenotypes shaped by infiltrating T cells.解析同源小鼠模型中的肿瘤微环境异质性：浸润 T 细胞塑造的癌症相关成纤维细胞表型的见解。

Front Immunol. 2024 Jan 8;14:1320614. doi: 10.3389/fimmu.2023.1320614. eCollection 2023.

Statistical Analysis of Post-Translational Modifications Quantified by Label-Free Proteomics Across Multiple Biological Conditions with R: Illustration from SARS-CoV-2 Infected Cells.基于 R 的无标记蛋白质组学在多种生物学条件下定量翻译后修饰的统计分析：来自 SARS-CoV-2 感染细胞的实例。

Methods Mol Biol. 2023;2426:267-302. doi: 10.1007/978-1-0716-1967-4_12.

Proteomic Profile of Procoagulant Extracellular Vesicles Reflects Complement System Activation and Platelet Hyperreactivity of Patients with Severe COVID-19.严重 COVID-19 患者促凝细胞外囊泡的蛋白质组学特征反映了补体系统激活和血小板高反应性。

Front Cell Infect Microbiol. 2022 Jul 22;12:926352. doi: 10.3389/fcimb.2022.926352. eCollection 2022.

Aag-2 Cell Proteome Modulation in Response to Chikungunya Virus Infection.寨卡病毒感染对 Aag-2 细胞蛋白质组的调节。

Front Cell Infect Microbiol. 2022 Jun 15;12:920425. doi: 10.3389/fcimb.2022.920425. eCollection 2022.

Diel investments in metabolite production and consumption in a model microbial system.模型微生物系统中代谢物产生和消耗的昼夜投入。

ISME J. 2022 May;16(5):1306-1317. doi: 10.1038/s41396-021-01172-w. Epub 2021 Dec 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

VSClust：基于特征的组学数据方差敏感聚类。

VSClust: feature-based variance-sensitive clustering of omics data.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献