比较缺失值插补方法以提高微阵列实验的聚类和解释。

Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.

机构信息

INSERM UMR-S 726, Equipe de Bioinformatique Génomique et Moléculaire, DSIMB, Université Paris Diderot-Paris 7, 2 place Jussieu, Paris, France.

出版信息

BMC Genomics. 2010 Jan 7;11:15. doi: 10.1186/1471-2164-11-15.

DOI:10.1186/1471-2164-11-15

PMID:20056002

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2827407/

Abstract

BACKGROUND

Microarray technologies produced large amount of data. In a previous study, we have shown the interest of k-Nearest Neighbour approach for restoring the missing gene expression values, and its positive impact of the gene clustering by hierarchical algorithm. Since, numerous replacement methods have been proposed to impute missing values (MVs) for microarray data. In this study, we have evaluated twelve different usable methods, and their influence on the quality of gene clustering. Interestingly we have used several datasets, both kinetic and non kinetic experiments from yeast and human.

RESULTS

We underline the excellent efficiency of approaches proposed and implemented by Bo and co-workers and especially one based on expected maximization (EM_array). These improvements have been observed also on the imputation of extreme values, the most difficult predictable values. We showed that the imputed MVs have still important effects on the stability of the gene clusters. The improvement on the clustering obtained by hierarchical clustering remains limited and, not sufficient to restore completely the correct gene associations. However, a common tendency can be found between the quality of the imputation method and the gene cluster stability. Even if the comparison between clustering algorithms is a complex task, we observed that k-means approach is more efficient to conserve gene associations.

CONCLUSIONS

More than 6.000.000 independent simulations have assessed the quality of 12 imputation methods on five very different biological datasets. Important improvements have so been done since our last study. The EM_array approach constitutes one efficient method for restoring the missing expression gene values, with a lower estimation error level. Nonetheless, the presence of MVs even at a low rate is a major factor of gene cluster instability. Our study highlights the need for a systematic assessment of imputation methods and so of dedicated benchmarks. A noticeable point is the specific influence of some biological dataset.

摘要

背景

微阵列技术产生了大量的数据。在之前的研究中，我们已经展示了 k-最近邻方法在恢复缺失基因表达值方面的优势，以及它对层次算法的基因聚类的积极影响。从那时起，已经提出了许多替换方法来填补微阵列数据中的缺失值 (MVs)。在这项研究中，我们评估了 12 种不同的可用方法，以及它们对基因聚类质量的影响。有趣的是，我们使用了多个数据集，包括来自酵母和人类的动力学和非动力学实验。

结果

我们强调了 Bo 及其同事提出和实现的方法的卓越效率，特别是基于期望最大化 (EM_array) 的方法。这些改进也在对极端值（最难以预测的值）的插补方面得到了观察。我们表明，插补的 MV 对基因聚类的稳定性仍然有重要影响。通过层次聚类获得的聚类改进仍然有限，不足以完全恢复正确的基因关联。然而，可以发现，在插补方法的质量和基因聚类稳定性之间存在共同的趋势。即使对聚类算法进行比较是一项复杂的任务，我们也观察到 k-means 方法在保留基因关联方面更有效。

结论

对五个非常不同的生物学数据集的 600 多万个独立模拟评估了 12 种插补方法的质量。自我们上次研究以来，已经取得了重要的改进。EM_array 方法是恢复缺失表达基因值的一种有效方法，具有更低的估计误差水平。尽管如此，即使在低比率下存在 MV 也是基因聚类不稳定的一个主要因素。我们的研究强调了对插补方法进行系统评估的必要性，因此需要专用的基准。一个值得注意的问题是一些生物学数据集的特殊影响。

相似文献

Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.比较缺失值插补方法以提高微阵列实验的聚类和解释。

BMC Genomics. 2010 Jan 7;11:15. doi: 10.1186/1471-2164-11-15.

Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering.微阵列实验缺失值对通过层次聚类的基因组稳定性的影响。

BMC Bioinformatics. 2004 Aug 23;5:114. doi: 10.1186/1471-2105-5-114.

Towards clustering of incomplete microarray data without the use of imputation.迈向无需插补的不完整微阵列数据聚类

Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31.

Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.并行缺失值插补：一种用于微阵列数据的新型稳健缺失值估计算法。

Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24.

Missing value imputation for microarray data: a comprehensive comparison study and a web tool.微阵列数据的缺失值插补：一项综合比较研究及网络工具

BMC Syst Biol. 2013;7 Suppl 6(Suppl 6):S12. doi: 10.1186/1752-0509-7-S6-S12. Epub 2013 Dec 13.

Impact of missing data imputation methods on gene expression clustering and classification.缺失数据插补方法对基因表达聚类和分类的影响。

BMC Bioinformatics. 2015 Feb 26;16:64. doi: 10.1186/s12859-015-0494-3.

Missing value imputation improves clustering and interpretation of gene expression microarray data.缺失值插补可改善基因表达微阵列数据的聚类和解读。

BMC Bioinformatics. 2008 Apr 18;9:202. doi: 10.1186/1471-2105-9-202.

Two-pass imputation algorithm for missing value estimation in gene expression time series.用于基因表达时间序列中缺失值估计的双程插补算法。

J Bioinform Comput Biol. 2007 Oct;5(5):1005-22. doi: 10.1142/s0219720007003053.

Ameliorative missing value imputation for robust biological knowledge inference.用于稳健生物学知识推理的改进型缺失值插补

J Biomed Inform. 2008 Aug;41(4):499-514. doi: 10.1016/j.jbi.2007.10.005. Epub 2007 Dec 31.

From co-expression to co-regulation: how many microarray experiments do we need?从共表达到共调控：我们需要多少微阵列实验？

Genome Biol. 2004;5(7):R48. doi: 10.1186/gb-2004-5-7-r48. Epub 2004 Jun 28.

引用本文的文献

A cost-sensitive deep neural network-based prediction model for the mortality in acute myocardial infarction patients with hypertension on imbalanced data.一种基于成本敏感深度神经网络的预测模型，用于不平衡数据下高血压急性心肌梗死患者的死亡率预测

Front Cardiovasc Med. 2024 Mar 19;11:1276608. doi: 10.3389/fcvm.2024.1276608. eCollection 2024.

General Trends of the Antibody VHs Domain Dynamics.抗体 VH 结构域动力学的总体趋势。

Int J Mol Sci. 2023 Feb 24;24(5):4511. doi: 10.3390/ijms24054511.

Latent triple trajectories of substance use as predictors for the onset of antisocial personality disorder among urban African American and Puerto Rican adults: A 22-year longitudinal study.潜在的三重物质使用轨迹可预测城市中非洲裔美国人和波多黎各成年人反社会人格障碍的发病：一项 22 年的纵向研究。

Subst Abus. 2022;43(1):442-450. doi: 10.1080/08897077.2021.1946890.

Transcriptome Profiling of Atlantic Salmon () Parr With Higher and Lower Pathogen Loads Following Infection.大西洋三文鱼（）幼鱼转录组图谱分析，在感染后具有更高和更低的病原体负荷。

Front Immunol. 2021 Dec 31;12:789465. doi: 10.3389/fimmu.2021.789465. eCollection 2021.

Transcriptome Profiling of Atlantic Salmon Adherent Head Kidney Leukocytes Reveals That Macrophages Are Selectively Enriched During Culture.大西洋鲑鱼粘附性头肾白细胞的转录组分析表明，巨噬细胞在培养过程中被选择性富集。

Front Immunol. 2021 Aug 16;12:709910. doi: 10.3389/fimmu.2021.709910. eCollection 2021.

Influence of Varying Dietary ω6 to ω3 Fatty Acid Ratios on the Hepatic Transcriptome, and Association with Phenotypic Traits (Growth, Somatic Indices, and Tissue Lipid Composition), in Atlantic Salmon ().不同饮食中ω6与ω3脂肪酸比例对大西洋鲑肝脏转录组的影响及其与表型性状（生长、体指数和组织脂质组成）的关联

Biology (Basel). 2021 Jun 24;10(7):578. doi: 10.3390/biology10070578.

The Impact of COVID-19 on Students' Marks: A Bayesian Hierarchical Modeling Approach.新冠疫情对学生成绩的影响：一种贝叶斯分层建模方法。

Metron. 2021;79(1):57-91. doi: 10.1007/s40300-021-00200-1. Epub 2021 Feb 17.

A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes.一种灵活、可解释且准确的方法，用于推断未测量基因的表达。

Nucleic Acids Res. 2020 Dec 2;48(21):e125. doi: 10.1093/nar/gkaa881.

CAncer bioMarker Prediction Pipeline (CAMPP)-A standardized framework for the analysis of quantitative biological data.癌症生物标志物预测管道 (CAMPP)-用于分析定量生物学数据的标准化框架。

PLoS Comput Biol. 2020 Mar 16;16(3):e1007665. doi: 10.1371/journal.pcbi.1007665. eCollection 2020 Mar.

Liver Transcriptome Profiling Reveals That Dietary DHA and EPA Levels Influence Suites of Genes Involved in Metabolism, Redox Homeostasis, and Immune Function in Atlantic Salmon (Salmo salar).肝脏转录组谱分析揭示了饮食 DHA 和 EPA 水平对参与大西洋鲑（Salmo salar）代谢、氧化还原稳态和免疫功能的基因簇的影响。

Mar Biotechnol (NY). 2020 Apr;22(2):263-284. doi: 10.1007/s10126-020-09950-x. Epub 2020 Feb 10.

本文引用的文献

Sequential local least squares imputation estimating missing value of microarray data.基于序列局部最小二乘法插补估计微阵列数据的缺失值

Comput Biol Med. 2008 Oct;38(10):1112-20. doi: 10.1016/j.compbiomed.2008.08.006. Epub 2008 Sep 30.

Missing value imputation improves clustering and interpretation of gene expression microarray data.缺失值插补可改善基因表达微阵列数据的聚类和解读。

BMC Bioinformatics. 2008 Apr 18;9:202. doi: 10.1186/1471-2105-9-202.

Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes.在表达谱中应使用哪种缺失值插补方法：一项比较研究及两种选择方案

BMC Bioinformatics. 2008 Jan 10;9:12. doi: 10.1186/1471-2105-9-12.

Clustering of change patterns using Fourier coefficients.使用傅里叶系数对变化模式进行聚类。

Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.

Two-pass imputation algorithm for missing value estimation in gene expression time series.用于基因表达时间序列中缺失值估计的双程插补算法。

J Bioinform Comput Biol. 2007 Oct;5(5):1005-22. doi: 10.1142/s0219720007003053.

Nearest Neighbor Networks: clustering expression data based on gene neighborhoods.最近邻网络：基于基因邻域对表达数据进行聚类。

BMC Bioinformatics. 2007 Jul 12;8:250. doi: 10.1186/1471-2105-8-250.

Improving cluster-based missing value estimation of DNA microarray data.改进基于聚类的DNA微阵列数据缺失值估计

Biomol Eng. 2007 Jun;24(2):273-82. doi: 10.1016/j.bioeng.2007.04.003. Epub 2007 Apr 19.

A meta-data based method for DNA microarray imputation.一种基于元数据的DNA微阵列插补方法。

BMC Bioinformatics. 2007 Mar 29;8:109. doi: 10.1186/1471-2105-8-109.

A multi-stage approach to clustering and imputation of gene expression profiles.一种用于基因表达谱聚类和插补的多阶段方法。

Bioinformatics. 2007 Apr 15;23(8):998-1005. doi: 10.1093/bioinformatics/btm053. Epub 2007 Feb 18.

An ensemble approach to microarray data-based gene prioritization after missing value imputation.一种在缺失值插补后基于微阵列数据进行基因优先级排序的集成方法。

Bioinformatics. 2007 Mar 15;23(6):747-54. doi: 10.1093/bioinformatics/btm010. Epub 2007 Jan 31.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

比较缺失值插补方法以提高微阵列实验的聚类和解释。

Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献