Suppr超能文献

计数数据聚类的统计学显著性。

Statistical significance of clustering for count data.

作者信息

Dai Yifan, Wu Di, Liu Yufeng

机构信息

Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

Department of Biomedical Sciences, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.

出版信息

Biometrics. 2025 Jul 3;81(3). doi: 10.1093/biomtc/ujaf120.

Abstract

Clustering is widely used in biomedical research for meaningful subgroup identification. However, most existing clustering algorithms do not account for the statistical uncertainty of the resulting clusters and consequently may generate spurious clusters due to natural sampling variation. To address this problem, the Statistical Significance of Clustering (SigClust) method was developed to evaluate the significance of clusters in high-dimensional data. While SigClust has been successful in assessing clustering significance for continuous data, it is not specifically designed for discrete data, such as count data in genomics. Moreover, SigClust and its variations can suffer from reduced statistical power when applied to non-Gaussian high-dimensional data. To overcome these limitations, we propose SigClust-DEV, a method designed to evaluate the significance of clusters in count data. Through extensive simulations, we compare SigClust-DEV against other existing SigClust approaches across various count distributions and demonstrate its superior performance. Furthermore, we apply our proposed SigClust-DEV to Hydra single-cell RNA sequencing (scRNA) data and electronic health records (EHRs) of cancer patients to identify meaningful latent cell types and patient subgroups, respectively.

摘要

聚类在生物医学研究中被广泛用于有意义的亚组识别。然而,大多数现有的聚类算法没有考虑到所得聚类的统计不确定性,因此可能由于自然抽样变异而产生虚假聚类。为了解决这个问题,开发了聚类统计显著性(SigClust)方法来评估高维数据中聚类的显著性。虽然SigClust在评估连续数据的聚类显著性方面取得了成功,但它并非专门为离散数据设计,例如基因组学中的计数数据。此外,SigClust及其变体应用于非高斯高维数据时可能会出现统计功效降低的情况。为了克服这些限制,我们提出了SigClust-DEV,一种旨在评估计数数据中聚类显著性的方法。通过广泛的模拟,我们在各种计数分布上比较了SigClust-DEV与其他现有的SigClust方法,并证明了它的优越性能。此外,我们将提出的SigClust-DEV应用于九头蛇单细胞RNA测序(scRNA)数据和癌症患者的电子健康记录(EHR),分别识别有意义的潜在细胞类型和患者亚组。

相似文献

5
Soft graph clustering for single-cell RNA sequencing data.用于单细胞RNA测序数据的软图聚类
BMC Bioinformatics. 2025 Jul 25;26(1):195. doi: 10.1186/s12859-025-06231-z.

本文引用的文献

1
Statistical Significance of Clustering with Multidimensional Scaling.多维缩放聚类的统计显著性
J Comput Graph Stat. 2024;33(1):219-230. doi: 10.1080/10618600.2023.2219708. Epub 2023 Jul 20.
2
Selective Inference for Hierarchical Clustering.层次聚类的选择性推断
J Am Stat Assoc. 2024;119(545):332-342. doi: 10.1080/01621459.2022.2116331. Epub 2022 Oct 11.
4
Significance analysis for clustering with single-cell RNA-sequencing data.基于单细胞 RNA-seq 数据的聚类意义分析。
Nat Methods. 2023 Aug;20(8):1196-1202. doi: 10.1038/s41592-023-01933-9. Epub 2023 Jul 10.
5

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验