• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

聚类后对单个特征的均值差异进行检验。

Testing for a difference in means of a single feature after clustering.

作者信息

Chen Yiqun T, Gao Lucy L

机构信息

Department of Biomedical Data Science, Stanford University.

Department of Statistics, University of British Columbia, November 29, 2023.

出版信息

ArXiv. 2023 Nov 27:arXiv:2311.16375v1.

PMID:38076519
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10705581/
Abstract

For many applications, it is critical to interpret and validate groups of observations obtained via clustering. A common validation approach involves testing differences in feature means between observations in two estimated clusters. In this setting, classical hypothesis tests lead to an inflated Type I error rate. To overcome this problem, we propose a new test for the difference in means in a single feature between a pair of clusters obtained using hierarchical or -means clustering. The test based on the proposed -value controls the selective Type I error rate in finite samples and can be efficiently computed. We further illustrate the validity and power of our proposal in simulation and demonstrate its use on single-cell RNA-sequencing data.

摘要

对于许多应用而言,解释和验证通过聚类获得的观测值组至关重要。一种常见的验证方法涉及测试两个估计聚类中观测值之间特征均值的差异。在这种情况下,经典假设检验会导致第一类错误率膨胀。为了克服这个问题,我们提出了一种新的检验方法,用于检验使用层次聚类或K均值聚类获得的一对聚类之间单个特征的均值差异。基于所提出的p值的检验在有限样本中控制选择性第一类错误率,并且可以高效计算。我们进一步在模拟中说明了我们提议的有效性和功效,并展示了其在单细胞RNA测序数据上的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/5752579d5ef3/nihpp-2311.16375v1-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/b203a88049b0/nihpp-2311.16375v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/d5b8a8531add/nihpp-2311.16375v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/e025228ebcaa/nihpp-2311.16375v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/b46394862dd4/nihpp-2311.16375v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/13e353004d37/nihpp-2311.16375v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/443f1ded0c19/nihpp-2311.16375v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/ba330bacdd2a/nihpp-2311.16375v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/5752579d5ef3/nihpp-2311.16375v1-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/b203a88049b0/nihpp-2311.16375v1-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/d5b8a8531add/nihpp-2311.16375v1-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/e025228ebcaa/nihpp-2311.16375v1-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/b46394862dd4/nihpp-2311.16375v1-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/13e353004d37/nihpp-2311.16375v1-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/443f1ded0c19/nihpp-2311.16375v1-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/ba330bacdd2a/nihpp-2311.16375v1-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf6a/10705581/5752579d5ef3/nihpp-2311.16375v1-f0008.jpg

相似文献

1
Testing for a difference in means of a single feature after clustering.聚类后对单个特征的均值差异进行检验。
ArXiv. 2023 Nov 27:arXiv:2311.16375v1.
2
Testing for a difference in means of a single feature after clustering.聚类后对单个特征的均值差异进行检验。
Biostatistics. 2024 Dec 31;26(1). doi: 10.1093/biostatistics/kxae046.
3
Selective inference for -means clustering.均值聚类的选择性推断。
J Mach Learn Res. 2023 May;24.
4
Selective Inference for Hierarchical Clustering.层次聚类的选择性推断
J Am Stat Assoc. 2024;119(545):332-342. doi: 10.1080/01621459.2022.2116331. Epub 2022 Oct 11.
5
Statistical significance for hierarchical clustering.层次聚类的统计学显著性。
Biometrics. 2017 Sep;73(3):811-821. doi: 10.1111/biom.12647. Epub 2017 Jan 18.
6
Quantifying uncertainty in spikes estimated from calcium imaging data.从钙成像数据估计的尖峰中量化不确定性。
Biostatistics. 2023 Apr 14;24(2):481-501. doi: 10.1093/biostatistics/kxab034.
7
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
8
Penalized unsupervised learning with outliers.带有异常值的惩罚无监督学习。
Stat Interface. 2013;6(2):211-221. doi: 10.4310/sii.2013.v6.n2.a5.
9
Development of a CT radiomics prognostic model for post renal tumor resection overall survival based on transformer enhanced K-means clustering.基于Transformer增强K均值聚类的肾肿瘤切除术后总生存CT影像组学预后模型的开发
Med Phys. 2025 May;52(5):3243-3257. doi: 10.1002/mp.17639. Epub 2025 Jan 27.
10
A Cheap Feature Selection Approach for the K-Means Algorithm.一种 K-Means 算法的廉价特征选择方法。
IEEE Trans Neural Netw Learn Syst. 2021 May;32(5):2195-2208. doi: 10.1109/TNNLS.2020.3002576. Epub 2021 May 3.

本文引用的文献

1
Selective Inference for Hierarchical Clustering.层次聚类的选择性推断
J Am Stat Assoc. 2024;119(545):332-342. doi: 10.1080/01621459.2022.2116331. Epub 2022 Oct 11.
2
Selective inference for -means clustering.均值聚类的选择性推断。
J Mach Learn Res. 2023 May;24.
3
More Powerful Selective Inference for the Graph Fused Lasso.图融合套索的更强有力的选择性推断
J Comput Graph Stat. 2023;32(2):577-587. doi: 10.1080/10618600.2022.2097246. Epub 2022 Sep 6.
4
Inference after latent variable estimation for single-cell RNA sequencing data.单细胞 RNA 测序数据中潜在变量估计后的推断。
Biostatistics. 2023 Dec 15;25(1):270-287. doi: 10.1093/biostatistics/kxac047.
5
Testing for a Change in Mean After Changepoint Detection.在变点检测后对均值变化进行检验。
J R Stat Soc Series B Stat Methodol. 2022 Sep;84(4):1082-1104. doi: 10.1111/rssb.12501. Epub 2022 Apr 12.
6
The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans.智慧人图谱:人类多器官单细胞转录组图谱。
Science. 2022 May 13;376(6594):eabl4896. doi: 10.1126/science.abl4896.
7
Post-selection inference for changepoint detection algorithms with application to copy number variation data.基于选择后推断的变点检测算法及其在拷贝数变异数据中的应用。
Biometrics. 2021 Sep;77(3):1037-1049. doi: 10.1111/biom.13422. Epub 2021 Jan 27.
8
A human liver cell atlas reveals heterogeneity and epithelial progenitors.人类肝脏细胞图谱揭示了其异质性和上皮祖细胞。
Nature. 2019 Aug;572(7768):199-204. doi: 10.1038/s41586-019-1373-2. Epub 2019 Jul 10.
9
A systematic performance evaluation of clustering methods for single-cell RNA-seq data.单细胞RNA测序数据聚类方法的系统性能评估
F1000Res. 2018 Jul 26;7:1141. doi: 10.12688/f1000research.15666.3. eCollection 2018.
10
Single-cell messenger RNA sequencing reveals rare intestinal cell types.单细胞信使 RNA 测序揭示罕见的肠道细胞类型。
Nature. 2015 Sep 10;525(7568):251-5. doi: 10.1038/nature14966. Epub 2015 Aug 19.