• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于模型的聚类分析与测量或估计误差。

Model-Based Clustering with Measurement or Estimation Errors.

机构信息

Department of Statistics, Oregon State University, Corvallis, OR 97330, USA.

出版信息

Genes (Basel). 2020 Feb 10;11(2):185. doi: 10.3390/genes11020185.

DOI:10.3390/genes11020185
PMID:32050700
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7074130/
Abstract

Model-based clustering with finite mixture models has become a widely used clustering method. One of the recent implementations is MCLUST. When objects to be clustered are summary statistics, such as regression coefficient estimates, they are naturally associated with estimation errors, whose covariance matrices can often be calculated exactly or approximated using asymptotic theory. This article proposes an extension to Gaussian finite mixture modeling-called MCLUST-ME-that properly accounts for the estimation errors. More specifically, we assume that the distribution of each observation consists of an underlying true component distribution and an independent measurement error distribution. Under this assumption, each unique value of estimation error covariance corresponds to its own classification boundary, which consequently results in a different grouping from MCLUST. Through simulation and application to an RNA-Seq data set, we discovered that under certain circumstances, explicitly, modeling estimation errors, improves clustering performance or provides new insights into the data, compared with when errors are simply ignored, whereas the degree of improvement depends on factors such as the distribution of error covariance matrices.

摘要

基于模型的聚类与有限混合模型已成为一种广泛使用的聚类方法。最近的实现之一是 MCLUST。当要聚类的对象是摘要统计信息(如回归系数估计)时,它们自然与估计误差相关联,其协方差矩阵通常可以使用渐近理论进行精确计算或近似计算。本文提出了对高斯有限混合模型的扩展,称为 MCLUST-ME,它可以正确地考虑到估计误差。更具体地说,我们假设每个观测值的分布由一个基本的真实分量分布和一个独立的测量误差分布组成。在这种假设下,每个独特的估计误差协方差值对应于其自己的分类边界,这就导致了与 MCLUST 不同的分组。通过模拟和对 RNA-Seq 数据集的应用,我们发现,在某些情况下,与简单忽略误差相比,明确地对估计误差进行建模可以提高聚类性能或为数据提供新的见解,而改进的程度取决于误差协方差矩阵的分布等因素。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/2241bf970efb/genes-11-00185-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/5473bac84532/genes-11-00185-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/f1bb2f598bf0/genes-11-00185-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/249cc140d851/genes-11-00185-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/79ef6ca432cf/genes-11-00185-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/ed92d50f599e/genes-11-00185-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/4447aab88a58/genes-11-00185-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/0c402c1b51e4/genes-11-00185-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/a4aa62934f0a/genes-11-00185-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/827a8d547874/genes-11-00185-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/f54900f6b5de/genes-11-00185-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/8e6cbbbfd3c6/genes-11-00185-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/7f1060ec4d25/genes-11-00185-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/2241bf970efb/genes-11-00185-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/5473bac84532/genes-11-00185-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/f1bb2f598bf0/genes-11-00185-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/249cc140d851/genes-11-00185-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/79ef6ca432cf/genes-11-00185-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/ed92d50f599e/genes-11-00185-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/4447aab88a58/genes-11-00185-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/0c402c1b51e4/genes-11-00185-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/a4aa62934f0a/genes-11-00185-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/827a8d547874/genes-11-00185-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/f54900f6b5de/genes-11-00185-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/8e6cbbbfd3c6/genes-11-00185-g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/7f1060ec4d25/genes-11-00185-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc5b/7074130/2241bf970efb/genes-11-00185-g013.jpg

相似文献

1
Model-Based Clustering with Measurement or Estimation Errors.基于模型的聚类分析与测量或估计误差。
Genes (Basel). 2020 Feb 10;11(2):185. doi: 10.3390/genes11020185.
2
A joint finite mixture model for clustering genes from independent Gaussian and beta distributed data.一种用于对来自独立高斯分布和贝塔分布数据的基因进行聚类的联合有限混合模型。
BMC Bioinformatics. 2009 May 29;10:165. doi: 10.1186/1471-2105-10-165.
3
Including probe-level measurement error in robust mixture clustering of replicated microarray gene expression.在复制微阵列基因表达的稳健混合聚类中纳入探针水平测量误差。
Stat Appl Genet Mol Biol. 2010;9:Article42. doi: 10.2202/1544-6115.1600. Epub 2010 Dec 9.
4
Subject level clustering using a negative binomial model for small transcriptomic studies.使用负二项模型进行小转录组研究的主题水平聚类。
BMC Bioinformatics. 2018 Dec 12;19(1):474. doi: 10.1186/s12859-018-2556-9.
5
Model-based clustering of microarray expression data via latent Gaussian mixture models.基于潜在高斯混合模型的微阵列表达数据的模型聚类。
Bioinformatics. 2010 Nov 1;26(21):2705-12. doi: 10.1093/bioinformatics/btq498. Epub 2010 Aug 29.
6
Model-based clustering and data transformations for gene expression data.基于模型的基因表达数据聚类与数据转换
Bioinformatics. 2001 Oct;17(10):977-87. doi: 10.1093/bioinformatics/17.10.977.
7
Performance Evaluation of Missing-Value Imputation Clustering Based on a Multivariate Gaussian Mixture Model.基于多元高斯混合模型的缺失值插补聚类性能评估
PLoS One. 2016 Aug 23;11(8):e0161112. doi: 10.1371/journal.pone.0161112. eCollection 2016.
8
Parsimonious mixtures of multivariate contaminated normal distributions.多元受污染正态分布的简约混合
Biom J. 2016 Nov;58(6):1506-1537. doi: 10.1002/bimj.201500144. Epub 2016 Aug 11.
9
Model-based clustering for RNA-seq data.基于模型的 RNA-seq 数据聚类。
Bioinformatics. 2014 Jan 15;30(2):197-205. doi: 10.1093/bioinformatics/btt632. Epub 2013 Nov 4.
10
mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models.mclust 5:使用高斯有限混合模型进行聚类、分类和密度估计
R J. 2016 Aug;8(1):289-317.

引用本文的文献

1
Classification of wildfires in relation to land cover types and associated variables by applying cluster analysis: a case study in the Iberian Peninsula.通过应用聚类分析对与土地覆盖类型及相关变量相关的野火进行分类:以伊比利亚半岛为例
Environ Monit Assess. 2025 May 3;197(6):619. doi: 10.1007/s10661-025-14053-y.
2
Mosaic loss of Y chromosome is associated with aging and epithelial injury in chronic kidney disease.Y 染色体镶嵌缺失与慢性肾脏病的衰老和上皮损伤有关。
Genome Biol. 2024 Jan 29;25(1):36. doi: 10.1186/s13059-024-03173-2.
3
Integrating morphological and genetic limits in the taxonomic delimitation of the Cuban taxa of Magnoliasubsect.Talauma (Magnoliaceae).

本文引用的文献

1
Single-gene negative binomial regression models for RNA-Seq data with higher-order asymptotic inference.用于RNA测序数据的单基因负二项回归模型及高阶渐近推断
Stat Interface. 2015;8(4):405-418. doi: 10.4310/SII.2015.v8.n4.a1.
2
mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models.mclust 5:使用高斯有限混合模型进行聚类、分类和密度估计
R J. 2016 Aug;8(1):289-317.
3
PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS.基于多元混合分析的模式聚类
整合形态学和遗传学界限以对木兰科木兰亚属塔劳木兰组的古巴分类群进行分类界定。
PhytoKeys. 2022 Nov 9;213:35-66. doi: 10.3897/phytokeys.213.82627. eCollection 2022.
4
Statistics in the Genomic Era.基因组时代的统计学。
Genes (Basel). 2020 Apr 18;11(4):443. doi: 10.3390/genes11040443.
Multivariate Behav Res. 1970 Apr 1;5(3):329-50. doi: 10.1207/s15327906mbr0503_6.
4
An approach for clustering gene expression data with error information.一种用于对带有误差信息的基因表达数据进行聚类的方法。
BMC Bioinformatics. 2006 Jan 12;7:17. doi: 10.1186/1471-2105-7-17.