• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

微阵列实验缺失值对通过层次聚类的基因组稳定性的影响。

Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering.

作者信息

de Brevern Alexandre G, Hazout Serge, Malpertuy Alain

机构信息

Equipe de Bioinformatique Génomique et Moléculaire (EBGM), INSERM E0346, Université Denis DIDEROT-Paris 7, case 7113, 2, place Jussieu, 75251 Paris, France.

出版信息

BMC Bioinformatics. 2004 Aug 23;5:114. doi: 10.1186/1471-2105-5-114.

DOI:10.1186/1471-2105-5-114
PMID:15324460
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC514701/
Abstract

BACKGROUND

Microarray technologies produced large amount of data. The hierarchical clustering is commonly used to identify clusters of co-expressed genes. However, microarray datasets often contain missing values (MVs) representing a major drawback for the use of the clustering methods. Usually the MVs are not treated, or replaced by zero or estimated by the k-Nearest Neighbor (kNN) approach. The topic of the paper is to study the stability of gene clusters, defined by various hierarchical clustering algorithms, of microarrays experiments including or not MVs.

RESULTS

In this study, we show that the MVs have important effects on the stability of the gene clusters. Moreover, the magnitude of the gene misallocations is depending on the aggregation algorithm. The most appropriate aggregation methods (e.g. complete-linkage and Ward) are highly sensitive to MVs, and surprisingly, for a very tiny proportion of MVs (e.g. 1%). In most of the case, the MVs must be replaced by expected values. The MVs replacement by the kNN approach clearly improves the identification of co-expressed gene clusters. Nevertheless, we observe that kNN approach is less suitable for the extreme values of gene expression.

CONCLUSION

The presence of MVs (even at a low rate) is a major factor of gene cluster instability. In addition, the impact depends on the hierarchical clustering algorithm used. Some methods should be used carefully. Nevertheless, the kNN approach constitutes one efficient method for restoring the missing expression gene values, with a low error level. Our study highlights the need of statistical treatments in microarray data to avoid misinterpretation.

摘要

背景

微阵列技术产生了大量数据。层次聚类常用于识别共表达基因的簇。然而,微阵列数据集常常包含缺失值(MVs),这是使用聚类方法的一个主要缺点。通常缺失值不做处理,或者用零替换,或者通过k近邻(kNN)方法估计。本文的主题是研究微阵列实验中由各种层次聚类算法定义的基因簇的稳定性,这些实验包含或不包含缺失值。

结果

在本研究中,我们表明缺失值对基因簇的稳定性有重要影响。此外,基因错配的程度取决于聚合算法。最合适的聚合方法(例如完全连锁法和沃德法)对缺失值高度敏感,令人惊讶的是,对于非常小比例的缺失值(例如1%)也是如此。在大多数情况下,缺失值必须用期望值替换。用kNN方法替换缺失值明显改善了共表达基因簇的识别。然而,我们观察到kNN方法不太适合基因表达的极端值。

结论

缺失值的存在(即使比例很低)是基因簇不稳定的一个主要因素。此外,影响取决于所使用的层次聚类算法。有些方法应谨慎使用。然而,kNN方法是一种恢复缺失表达基因值的有效方法,错误水平较低。我们的研究强调了对微阵列数据进行统计处理以避免错误解读的必要性。

相似文献

1
Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering.微阵列实验缺失值对通过层次聚类的基因组稳定性的影响。
BMC Bioinformatics. 2004 Aug 23;5:114. doi: 10.1186/1471-2105-5-114.
2
Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.比较缺失值插补方法以提高微阵列实验的聚类和解释。
BMC Genomics. 2010 Jan 7;11:15. doi: 10.1186/1471-2164-11-15.
3
Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules.基于基因表达谱和功能模块,替换不可靠的cDNA微阵列测量值对疾病分类的影响。
Bioinformatics. 2006 Dec 1;22(23):2883-9. doi: 10.1093/bioinformatics/btl339. Epub 2006 Jun 29.
4
From co-expression to co-regulation: how many microarray experiments do we need?从共表达到共调控:我们需要多少微阵列实验?
Genome Biol. 2004;5(7):R48. doi: 10.1186/gb-2004-5-7-r48. Epub 2004 Jun 28.
5
Selection of informative clusters from hierarchical cluster tree with gene classes.从带有基因类别的层次聚类树中选择信息性聚类
BMC Bioinformatics. 2004 Mar 25;5:32. doi: 10.1186/1471-2105-5-32.
6
Mass distributed clustering: a new algorithm for repeated measurements in gene expression data.大规模分布式聚类:一种用于基因表达数据重复测量的新算法。
Genome Inform. 2005;16(2):183-94.
7
Cluster stability scores for microarray data in cancer studies.癌症研究中微阵列数据的聚类稳定性评分。
BMC Bioinformatics. 2003 Sep 6;4:36. doi: 10.1186/1471-2105-4-36.
8
Towards clustering of incomplete microarray data without the use of imputation.迈向无需插补的不完整微阵列数据聚类
Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31.
9
Knowledge-assisted recognition of cluster boundaries in gene expression data.基因表达数据中聚类边界的知识辅助识别。
Artif Intell Med. 2005 Sep-Oct;35(1-2):171-83. doi: 10.1016/j.artmed.2005.02.007.
10
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.使用一致性算法对大型DNA微阵列数据集进行稳健的多尺度聚类
Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.

引用本文的文献

1
Tutorial on survival modeling with applications to omics data.生存分析建模教程及其在组学数据中的应用。
Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae132.
2
The Performance Evaluation of The Random Forest Algorithm for A Gene Selection in Identifying Genes Associated with Resectable Pancreatic Cancer in Microarray Dataset: A Retrospective Study.用于在微阵列数据集中识别可切除胰腺癌相关基因的基因选择的随机森林算法性能评估:一项回顾性研究
Cell J. 2023 May 28;25(5):347-353. doi: 10.22074/cellj.2023.1971852.1156.
3
General Trends of the Antibody VHs Domain Dynamics.

本文引用的文献

1
Missing-value estimation using linear and non-linear regression with Bayesian gene selection.使用线性和非线性回归及贝叶斯基因选择进行缺失值估计。
Bioinformatics. 2003 Nov 22;19(17):2302-7. doi: 10.1093/bioinformatics/btg323.
2
Application of independent component analysis to microarrays.独立成分分析在微阵列中的应用。
Genome Biol. 2003;4(11):R76. doi: 10.1186/gb-2003-4-11-r76. Epub 2003 Oct 24.
3
An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data.
抗体 VH 结构域动力学的总体趋势。
Int J Mol Sci. 2023 Feb 24;24(5):4511. doi: 10.3390/ijms24054511.
4
Kernel weighted least square approach for imputing missing values of metabolomics data.核加权最小二乘法在代谢组学数据缺失值插补中的应用。
Sci Rep. 2021 May 27;11(1):11108. doi: 10.1038/s41598-021-90654-0.
5
Screening for Core Genes Related to Pathogenesis of Alzheimer's Disease.阿尔茨海默病发病机制相关核心基因的筛选
Front Cell Dev Biol. 2021 Apr 22;9:668738. doi: 10.3389/fcell.2021.668738. eCollection 2021.
6
A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes.一种灵活、可解释且准确的方法,用于推断未测量基因的表达。
Nucleic Acids Res. 2020 Dec 2;48(21):e125. doi: 10.1093/nar/gkaa881.
7
Use of meat juice and blood serum with a miniaturised protein microarray assay to develop a multi-parameter IgG screening test with high sample throughput potential for slaughtering pigs.使用肉汁和血清与微型蛋白质微阵列分析相结合,开发了一种具有高通量潜力的多参数 IgG 筛选检测方法,用于屠宰猪。
BMC Vet Res. 2020 Apr 6;16(1):106. doi: 10.1186/s12917-020-02308-4.
8
Development of a miniaturized protein microarray as a new serological IgG screening test for zoonotic agents and production diseases in pigs.研制一种微型蛋白质微阵列,作为一种新的血清 IgG 筛选试验,用于检测猪的人畜共患病原体和生产疾病。
PLoS One. 2019 May 22;14(5):e0217290. doi: 10.1371/journal.pone.0217290. eCollection 2019.
9
The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments.不可忽视的缺失数据对无标记质谱蛋白质组学实验的影响。
Ann Appl Stat. 2018 Dec;12(4):2075-2095. doi: 10.1214/18-AOAS1144. Epub 2018 Nov 13.
10
Microbial community structure and functional potential of lava-formed Gotjawal soils in Jeju, Korea.韩国济州岛熔岩形成的 Gotjawal 土壤的微生物群落结构和功能潜力。
PLoS One. 2018 Oct 12;13(10):e0204761. doi: 10.1371/journal.pone.0204761. eCollection 2018.
一种用于微阵列数据中癌症类别发现和标记基因识别的无监督分层动态自组织方法。
Bioinformatics. 2003 Nov 1;19(16):2131-40. doi: 10.1093/bioinformatics/btg296.
4
A Bayesian missing value estimation method for gene expression profile data.一种用于基因表达谱数据的贝叶斯缺失值估计方法。
Bioinformatics. 2003 Nov 1;19(16):2088-96. doi: 10.1093/bioinformatics/btg287.
5
Robust singular value decomposition analysis of microarray data.微阵列数据的稳健奇异值分解分析
Proc Natl Acad Sci U S A. 2003 Nov 11;100(23):13167-72. doi: 10.1073/pnas.1733249100. Epub 2003 Oct 27.
6
DNA microarrays: vital statistics.DNA微阵列:重要统计数据。
Nature. 2003 Aug 7;424(6949):610-2. doi: 10.1038/424610a.
7
Possibility of using DNA chip technology for diagnosis of human papillomavirus.使用DNA芯片技术诊断人乳头瘤病毒的可能性。
J Biochem Mol Biol. 2003 Jul 31;36(4):349-53. doi: 10.5483/bmbrep.2003.36.4.349.
8
Repeated observation of breast tumor subtypes in independent gene expression data sets.在独立基因表达数据集中对乳腺肿瘤亚型的重复观察。
Proc Natl Acad Sci U S A. 2003 Jul 8;100(14):8418-23. doi: 10.1073/pnas.0932692100. Epub 2003 Jun 26.
9
Clustering gene-expression data with repeated measurements.对具有重复测量值的基因表达数据进行聚类分析。
Genome Biol. 2003;4(5):R34. doi: 10.1186/gb-2003-4-5-r34. Epub 2003 Apr 25.
10
Combining hierarchical clustering and self-organizing maps for exploratory analysis of gene expression patterns.结合层次聚类和自组织映射进行基因表达模式的探索性分析。
J Proteome Res. 2002 Sep-Oct;1(5):467-70. doi: 10.1021/pr025521v.