Suppr超能文献

VSClust:基于特征的组学数据方差敏感聚类。

VSClust: feature-based variance-sensitive clustering of omics data.

机构信息

Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense M, Denmark.

VILLUM Center for Bioanalytical Sciences, University of Southern Denmark, Odense M, Denmark.

出版信息

Bioinformatics. 2018 Sep 1;34(17):2965-2972. doi: 10.1093/bioinformatics/bty224.

Abstract

MOTIVATION

Data clustering is indispensable for identifying biologically relevant molecular features in large-scale omics experiments with thousands of measurements at multiple conditions. Optimal clustering results yield groups of functionally related features that may include genes, proteins and metabolites in biological processes and molecular networks. Omics experiments typically include replicated measurements of each feature within a given condition to statistically assess feature-specific variation. Current clustering approaches ignore this variation by averaging, which often leads to incorrect cluster assignments.

RESULTS

We present VSClust that accounts for feature-specific variance. Based on an algorithm derived from fuzzy clustering, VSClust unifies statistical testing with pattern recognition to cluster the data into feature groups that more accurately reflect the underlying molecular and functional behavior. We apply VSClust to artificial and experimental datasets comprising hundreds to >80 000 features across 6-20 different conditions including genomics, transcriptomics, proteomics and metabolomics experiments. VSClust avoids arbitrary averaging methods, outperforms standard fuzzy c-means clustering and simplifies the data analysis workflow in large-scale omics studies.

AVAILABILITY AND IMPLEMENTATION

Download VSClust at https://bitbucket.org/veitveit/vsclust or access it through computproteomics.bmb.sdu.dk/Apps/VSClust.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在大规模组学实验中,需要对数千个在多种条件下的测量数据进行聚类分析,以识别出具有生物学意义的分子特征。最优的聚类结果可生成功能相关的特征组,其中可能包括生物学过程和分子网络中的基因、蛋白质和代谢物。组学实验通常包括在给定条件下对每个特征进行重复测量,以统计评估特征特定的变化。当前的聚类方法通过平均化来忽略这种变化,这往往会导致不正确的聚类分配。

结果

我们提出了 VSClust,它可以考虑特征特定的方差。基于一种源自模糊聚类的算法,VSClust 将统计检验与模式识别统一起来,将数据聚类为特征组,这些特征组更准确地反映了潜在的分子和功能行为。我们将 VSClust 应用于人工和实验数据集,这些数据集包含数百到>80000 个特征,分布在 6-20 个不同的条件下,包括基因组学、转录组学、蛋白质组学和代谢组学实验。VSClust 避免了任意的平均方法,优于标准的模糊 c-均值聚类,并简化了大规模组学研究中的数据分析工作流程。

可用性和实现

可在 https://bitbucket.org/veitveit/vsclust 下载 VSClust 或通过 computproteomics.bmb.sdu.dk/Apps/VSClust 访问它。

补充信息

补充数据可在 Bioinformatics 在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验