一般测量误差模型中的聚类

Clustering in General Measurement Error Models.

作者信息

Su Ya, Reedy Jill, Carroll Raymond J

机构信息

Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143.

Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, Bethesda, MD 20892.

出版信息

Stat Sin. 2018 Oct;28(4):2337-2351.

PMID:30636855

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6329467/

Abstract

This paper is dedicated to the memory of Peter G. Hall. It concerns a deceptively simple question: if one observes variables corrupted with measurement error of possibly very complex form, can one recreate asymptotically the clusters that would have been found had there been no measurement error? We show that the answer is yes, and that the solution is surprisingly simple and general. The method itself is to simulate, by computer, realizations with the same distribution as that of the true variables, and then to apply clustering to these realizations. Technically, we show that if one uses K-means clustering or any other risk minimizing clustering, and a multivariate deconvolution device with certain smoothness and convergence properties, then, in the limit, the cluster means based on our method converge to the same cluster means as if there is no measurement error. Along with the method and its technical justification, we analyze two important nutrition data sets, finding patterns that make sense nutritionally.

摘要

本文谨献给彼得·G·霍尔。它涉及一个看似简单实则不然的问题：如果观察到的变量受到可能非常复杂形式的测量误差影响，那么能否渐近地重建在没有测量误差时会发现的聚类？我们证明答案是肯定的，而且解决方案出奇地简单且具有通用性。该方法本身是通过计算机模拟与真实变量具有相同分布的实现，然后对这些实现应用聚类。从技术层面讲，我们表明如果使用K均值聚类或任何其他风险最小化聚类，以及具有某些平滑性和收敛性的多元反卷积装置，那么在极限情况下，基于我们方法的聚类均值会收敛到与没有测量误差时相同的聚类均值。除了该方法及其技术依据，我们还分析了两个重要的营养数据集，发现了具有营养意义的模式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7d37/6329467/786a79d93659/nihms891005f1.jpg

相似文献

Clustering in General Measurement Error Models.一般测量误差模型中的聚类

Stat Sin. 2018 Oct;28(4):2337-2351.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

K-means clustering versus validation measures: a data-distribution perspective.K均值聚类与验证度量：数据分布视角

IEEE Trans Syst Man Cybern B Cybern. 2009 Apr;39(2):318-31. doi: 10.1109/TSMCB.2008.2004559. Epub 2008 Dec 12.

How to Use Model-Based Cluster Analysis Efficiently in Person-Oriented Research.如何在以人为本的研究中高效运用基于模型的聚类分析

J Pers Oriented Res. 2021 Aug 26;7(1):22-35. doi: 10.17505/jpor.2021.23449. eCollection 2021.

Personal exposure to mixtures of volatile organic compounds: modeling and further analysis of the RIOPA data.个人对挥发性有机化合物混合物的暴露：RIOPA数据的建模与进一步分析

Res Rep Health Eff Inst. 2014 Jun(181):3-63.

Clustering of change patterns using Fourier coefficients.使用傅里叶系数对变化模式进行聚类。

Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.

Subspace K-means clustering.子空间 K-均值聚类。

Behav Res Methods. 2013 Dec;45(4):1011-23. doi: 10.3758/s13428-013-0329-y.

Deconvolution estimation of mixture distributions with boundaries.具有边界的混合分布的反卷积估计

Electron J Stat. 2013;7:323-341. doi: 10.1214/13-EJS774.

The global Minmax -means algorithm.全局最小最大均值算法。

Springerplus. 2016 Sep 27;5(1):1665. doi: 10.1186/s40064-016-3329-4. eCollection 2016.

[Standard technical specifications for methacholine chloride (Methacholine) bronchial challenge test (2023)].[氯化乙酰甲胆碱支气管激发试验标准技术规范（2023年）]

Zhonghua Jie He He Hu Xi Za Zhi. 2024 Feb 12;47(2):101-119. doi: 10.3760/cma.j.cn112147-20231019-00247.

引用本文的文献

Clustering Functional Data With Measurement Errors: A Simulation-Based Approach.基于模拟的带测量误差的功能数据聚类方法

Stat Med. 2024 Dec 10;43(28):5344-5352. doi: 10.1002/sim.10238. Epub 2024 Oct 15.

Extending Methods in Dietary Patterns Research.扩展饮食模式研究方法。

Nutrients. 2018 May 7;10(5):571. doi: 10.3390/nu10050571.

本文引用的文献

Bayesian Semiparametric Multivariate Density Deconvolution.贝叶斯半参数多元密度反褶积

J Am Stat Assoc. 2018;113(521):401-416. doi: 10.1080/01621459.2016.1260467. Epub 2017 Nov 13.

Dietary patterns by cluster analysis in pregnant women: relationship with nutrient intakes and dietary patterns in 7-year-old offspring.通过聚类分析研究孕妇的饮食模式：与7岁后代营养摄入及饮食模式的关系

Matern Child Nutr. 2017 Apr;13(2). doi: 10.1111/mcn.12353. Epub 2016 Oct 9.

Moment reconstruction and moment-adjusted imputation when exposure is generated by a complex, nonlinear random effects modeling process.当暴露由复杂的非线性随机效应建模过程生成时的矩重建和矩调整插补。

Biometrics. 2016 Dec;72(4):1369-1377. doi: 10.1111/biom.12524. Epub 2016 Apr 8.

A comparison of the dietary patterns derived by principal component analysis and cluster analysis in older Australians.澳大利亚老年人中通过主成分分析和聚类分析得出的饮食模式比较。

Int J Behav Nutr Phys Act. 2016 Feb 29;13:30. doi: 10.1186/s12966-016-0353-2.

Functional and Structural Methods with Mixed Measurement Error and Misclassification in Covariates.协变量存在混合测量误差和错误分类时的功能与结构方法

J Am Stat Assoc. 2015 Jun 1;110(510):681-696. doi: 10.1080/01621459.2014.922777.

Dietary Patterns Derived by Cluster Analysis are Associated with Cognitive Function among Korean Older Adults.通过聚类分析得出的饮食模式与韩国老年人的认知功能相关。

Nutrients. 2015 May 29;7(6):4154-69. doi: 10.3390/nu7064154.

Associations of key diet-quality indexes with mortality in the Multiethnic Cohort: the Dietary Patterns Methods Project.多民族队列研究中关键饮食质量指标与死亡率的关联：饮食模式方法项目

Am J Clin Nutr. 2015 Mar;101(3):587-97. doi: 10.3945/ajcn.114.090688. Epub 2015 Jan 7.

The Dietary Patterns Methods Project: synthesis of findings across cohorts and relevance to dietary guidance.饮食模式方法项目：跨队列研究结果的综合及其与饮食指南的相关性

J Nutr. 2015 Mar;145(3):393-402. doi: 10.3945/jn.114.205336. Epub 2015 Jan 21.

Bayesian Semiparametric Density Deconvolution in the Presence of Conditionally Heteroscedastic Measurement Errors.存在条件异方差测量误差时的贝叶斯半参数密度反卷积

J Comput Graph Stat. 2014 Oct 1;23(4):1101-1125. doi: 10.1080/10618600.2014.899237.

Low sodium intake--cardiovascular health benefit or risk?低钠摄入——对心血管健康有益还是有风险？

N Engl J Med. 2014 Aug 14;371(7):677-9. doi: 10.1056/NEJMe1407695.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验