• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有重叠量化的有限混合模型总结

Summarizing Finite Mixture Model with Overlapping Quantification.

作者信息

Kyoya Shunki, Yamanishi Kenji

机构信息

Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan.

出版信息

Entropy (Basel). 2021 Nov 13;23(11):1503. doi: 10.3390/e23111503.

DOI:10.3390/e23111503
PMID:34828201
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8622449/
Abstract

Finite mixture models are widely used for modeling and clustering data. When they are used for clustering, they are often interpreted by regarding each component as one cluster. However, this assumption may be invalid when the components overlap. It leads to the issue of analyzing such overlaps to correctly understand the models. The primary purpose of this paper is to establish a theoretical framework for interpreting the overlapping mixture models by estimating how they overlap, using measures of information such as entropy and mutual information. This is achieved by merging components to regard multiple components as one cluster and summarizing the merging results. First, we propose three conditions that any merging criterion should satisfy. Then, we investigate whether several existing merging criteria satisfy the conditions and modify them to fulfill more conditions. Second, we propose a novel concept named clustering summarization to evaluate the merging results. In it, we can quantify how overlapped and biased the clusters are, using mutual information-based criteria. Using artificial and real datasets, we empirically demonstrate that our methods of modifying criteria and summarizing results are effective for understanding the cluster structures. We therefore give a new view of interpretability/explainability for model-based clustering.

摘要

有限混合模型被广泛用于数据建模和聚类。当它们用于聚类时,通常将每个组件视为一个聚类来进行解释。然而,当组件重叠时,这种假设可能无效。这就导致了分析此类重叠以正确理解模型的问题。本文的主要目的是通过使用熵和互信息等信息度量来估计重叠混合模型的重叠方式,从而建立一个解释重叠混合模型的理论框架。这是通过合并组件,将多个组件视为一个聚类并总结合并结果来实现的。首先,我们提出了任何合并准则都应满足的三个条件。然后,我们研究了几个现有的合并准则是否满足这些条件,并对它们进行修改以满足更多条件。其次,我们提出了一个名为聚类总结的新概念来评估合并结果。在这个概念中,我们可以使用基于互信息的准则来量化聚类的重叠程度和偏差程度。使用人工数据集和真实数据集,我们通过实验证明了我们修改准则和总结结果的方法对于理解聚类结构是有效的。因此,我们为基于模型的聚类的可解释性/可说明性提供了一个新的视角。

相似文献

1
Summarizing Finite Mixture Model with Overlapping Quantification.具有重叠量化的有限混合模型总结
Entropy (Basel). 2021 Nov 13;23(11):1503. doi: 10.3390/e23111503.
2
Mixture Complexity and Its Application to Gradual Clustering Change Detection.混合复杂性及其在渐进聚类变化检测中的应用。
Entropy (Basel). 2022 Oct 1;24(10):1407. doi: 10.3390/e24101407.
3
SMART: unique splitting-while-merging framework for gene clustering.SMART:用于基因聚类的独特的边合并边分裂框架。
PLoS One. 2014 Apr 8;9(4):e94141. doi: 10.1371/journal.pone.0094141. eCollection 2014.
4
A clustering effectiveness measurement model based on merging similar clusters.一种基于合并相似聚类的聚类有效性度量模型。
PeerJ Comput Sci. 2024 Feb 29;10:e1863. doi: 10.7717/peerj-cs.1863. eCollection 2024.
5
Mixture models with multiple levels, with application to the analysis of multifactor gene expression data.具有多个层次的混合模型及其在多因素基因表达数据分析中的应用。
Biostatistics. 2008 Jul;9(3):540-54. doi: 10.1093/biostatistics/kxm051. Epub 2008 Feb 5.
6
Visual MRI: merging information visualization and non-parametric clustering techniques for MRI dataset analysis.可视化磁共振成像:融合信息可视化与非参数聚类技术用于磁共振成像数据集分析。
Artif Intell Med. 2008 Nov;44(3):183-99. doi: 10.1016/j.artmed.2008.06.006. Epub 2008 Sep 4.
7
A joint finite mixture model for clustering genes from independent Gaussian and beta distributed data.一种用于对来自独立高斯分布和贝塔分布数据的基因进行聚类的联合有限混合模型。
BMC Bioinformatics. 2009 May 29;10:165. doi: 10.1186/1471-2105-10-165.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).大分子拥挤现象:化学与物理邂逅生物学(瑞士阿斯科纳,2012年6月10日至14日)
Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.
10
Combining Mixture Components for Clustering.组合混合成分用于聚类。
J Comput Graph Stat. 2010 Jun 1;9(2):332-353. doi: 10.1198/jcgs.2010.08111.

引用本文的文献

1
Mixture Complexity and Its Application to Gradual Clustering Change Detection.混合复杂性及其在渐进聚类变化检测中的应用。
Entropy (Basel). 2022 Oct 1;24(10):1407. doi: 10.3390/e24101407.
2
No need for a gold-standard test: on the mining of diagnostic test performance indices merely based on the distribution of the test value.无需金标准测试:仅基于检测值分布即可挖掘诊断检测性能指标。
BMC Med Res Methodol. 2023 Jan 30;23(1):30. doi: 10.1186/s12874-023-01841-8.

本文引用的文献

1
Mixture Complexity and Its Application to Gradual Clustering Change Detection.混合复杂性及其在渐进聚类变化检测中的应用。
Entropy (Basel). 2022 Oct 1;24(10):1407. doi: 10.3390/e24101407.
2
Identifying Mixtures of Mixtures Using Bayesian Estimation.使用贝叶斯估计识别混合混合物。
J Comput Graph Stat. 2017 Apr 3;26(2):285-295. doi: 10.1080/10618600.2016.1200472. Epub 2017 Apr 24.
3
Mixtures of Shifted AsymmetricLaplace Distributions.平移非对称拉普拉斯分布的混合。
IEEE Trans Pattern Anal Mach Intell. 2014 Jun;36(6):1149-57. doi: 10.1109/TPAMI.2013.216.
4
Critical assessment of automated flow cytometry data analysis techniques.自动化流式细胞术数据分析技术的批判性评估。
Nat Methods. 2013 Mar;10(3):228-38. doi: 10.1038/nmeth.2365. Epub 2013 Feb 10.
5
Combining Mixture Components for Clustering.组合混合成分用于聚类。
J Comput Graph Stat. 2010 Jun 1;9(2):332-353. doi: 10.1198/jcgs.2010.08111.
6
Parametric embedding for class visualization.
Neural Comput. 2007 Sep;19(9):2536-56. doi: 10.1162/neco.2007.19.9.2536.
7
SMEM algorithm is not fully compatible with maximum-likelihood framework.
Neural Comput. 2002 Jun;14(6):1261-6. doi: 10.1162/089976602753712927.
8
SMEM algorithm for mixture models.混合模型的SMEM算法。
Neural Comput. 2000 Sep;12(9):2109-28. doi: 10.1162/089976600300015088.
9
Expert system for predicting protein localization sites in gram-negative bacteria.用于预测革兰氏阴性菌中蛋白质定位位点的专家系统。
Proteins. 1991;11(2):95-110. doi: 10.1002/prot.340110203.
10
A knowledge base for predicting protein localization sites in eukaryotic cells.一个用于预测真核细胞中蛋白质定位位点的知识库。
Genomics. 1992 Dec;14(4):897-911. doi: 10.1016/s0888-7543(05)80111-9.