缺失值插补可改善基因表达微阵列数据的聚类和解读。

Missing value imputation improves clustering and interpretation of gene expression microarray data.

作者信息

Tuikkala Johannes, Elo Laura L, Nevalainen Olli S, Aittokallio Tero

机构信息

Department of Information Technology and TUCS, University of Turku, FI-20014 Turku, Finland.

出版信息

BMC Bioinformatics. 2008 Apr 18;9:202. doi: 10.1186/1471-2105-9-202.

DOI:10.1186/1471-2105-9-202

PMID:18423022

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2386492/

Abstract

BACKGROUND

Missing values frequently pose problems in gene expression microarray experiments as they can hinder downstream analysis of the datasets. While several missing value imputation approaches are available to the microarray users and new ones are constantly being developed, there is no general consensus on how to choose between the different methods since their performance seems to vary drastically depending on the dataset being used.

RESULTS

We show that this discrepancy can mostly be attributed to the way in which imputation methods have traditionally been developed and evaluated. By comparing a number of advanced imputation methods on recent microarray datasets, we show that even when there are marked differences in the measurement-level imputation accuracies across the datasets, these differences become negligible when the methods are evaluated in terms of how well they can reproduce the original gene clusters or their biological interpretations. Regardless of the evaluation approach, however, imputation always gave better results than ignoring missing data points or replacing them with zeros or average values, emphasizing the continued importance of using more advanced imputation methods.

CONCLUSION

The results demonstrate that, while missing values are still severely complicating microarray data analysis, their impact on the discovery of biologically meaningful gene groups can - up to a certain degree - be reduced by using readily available and relatively fast imputation methods, such as the Bayesian Principal Components Algorithm (BPCA).

摘要

背景

缺失值在基因表达微阵列实验中经常带来问题，因为它们可能会妨碍数据集的下游分析。虽然微阵列用户可以使用几种缺失值插补方法，并且新方法也在不断开发，但对于如何在不同方法之间进行选择尚无普遍共识，因为它们的性能似乎会因所使用的数据集而有很大差异。

结果

我们表明，这种差异主要可归因于传统上开发和评估插补方法的方式。通过在最近的微阵列数据集上比较多种先进的插补方法，我们表明，即使各数据集在测量水平的插补准确性上存在显著差异，但当根据这些方法在重现原始基因簇或其生物学解释方面的表现来评估时，这些差异就变得微不足道了。然而，无论采用何种评估方法，插补总是比忽略缺失数据点或将其替换为零或平均值能得到更好的结果，这强调了使用更先进插补方法的持续重要性。

结论

结果表明，虽然缺失值仍然严重使微阵列数据分析复杂化，但通过使用现成且相对快速的插补方法，如贝叶斯主成分算法（BPCA），在一定程度上可以减少它们对发现具有生物学意义的基因组的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e1b3/2386492/19c63d830baf/1471-2105-9-202-1.jpg

相似文献

Missing value imputation improves clustering and interpretation of gene expression microarray data.

BMC Bioinformatics. 2008 Apr 18;9:202. doi: 10.1186/1471-2105-9-202.

Missing value imputation for microarray gene expression data using histone acetylation information.

BMC Bioinformatics. 2008 May 29;9:252. doi: 10.1186/1471-2105-9-252.

Robust imputation method for missing values in microarray data.

BMC Bioinformatics. 2007 May 3;8 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-8-S2-S6.

A meta-data based method for DNA microarray imputation.

BMC Bioinformatics. 2007 Mar 29;8:109. doi: 10.1186/1471-2105-8-109.

Towards clustering of incomplete microarray data without the use of imputation.

Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31.

Ameliorative missing value imputation for robust biological knowledge inference.

J Biomed Inform. 2008 Aug;41(4):499-514. doi: 10.1016/j.jbi.2007.10.005. Epub 2007 Dec 31.

Methods for evaluating clustering algorithms for gene expression data using a reference set of functional classes.

BMC Bioinformatics. 2006 Aug 31;7:397. doi: 10.1186/1471-2105-7-397.

Integrative missing value estimation for microarray data.

BMC Bioinformatics. 2006 Oct 12;7:449. doi: 10.1186/1471-2105-7-449.

DNA microarray data imputation and significance analysis of differential expression.

Bioinformatics. 2005 Nov 15;21(22):4155-61. doi: 10.1093/bioinformatics/bti638. Epub 2005 Aug 23.

Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.

BMC Genomics. 2010 Jan 7;11:15. doi: 10.1186/1471-2164-11-15.

引用本文的文献

Missing value replacement in strings and applications.

Data Min Knowl Discov. 2025;39(2):12. doi: 10.1007/s10618-024-01074-3. Epub 2025 Jan 22.

A comprehensive survey on computational learning methods for analysis of gene expression data.

Front Mol Biosci. 2022 Nov 7;9:907150. doi: 10.3389/fmolb.2022.907150. eCollection 2022.

A comparative study of evaluating missing value imputation methods in label-free proteomics.

Sci Rep. 2021 Jan 19;11(1):1760. doi: 10.1038/s41598-021-81279-4.

A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes.

Nucleic Acids Res. 2020 Dec 2;48(21):e125. doi: 10.1093/nar/gkaa881.

A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation.

Brief Bioinform. 2018 Nov 27;19(6):1344-1355. doi: 10.1093/bib/bbx054.

MVIAeval: a web tool for comprehensively evaluating the performance of a new missing value imputation algorithm.

BMC Bioinformatics. 2017 Jan 13;18(1):31. doi: 10.1186/s12859-016-1429-3.

Impact of missing data imputation methods on gene expression clustering and classification.

BMC Bioinformatics. 2015 Feb 26;16:64. doi: 10.1186/s12859-015-0494-3.

Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics.

J Proteome Res. 2015 May 1;14(5):1993-2001. doi: 10.1021/pr501138h. Epub 2015 Apr 22.

Missing value imputation for microarray data: a comprehensive comparison study and a web tool.

BMC Syst Biol. 2013;7 Suppl 6(Suppl 6):S12. doi: 10.1186/1752-0509-7-S6-S12. Epub 2013 Dec 13.

Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets.

Adv Bioinformatics. 2013;2013:790567. doi: 10.1155/2013/790567. Epub 2013 Oct 9.

本文引用的文献

Iterated local least squares microarray missing value imputation.

J Bioinform Comput Biol. 2006 Oct;4(5):935-57. doi: 10.1142/s0219720006002302.

Integrative missing value estimation for microarray data.

BMC Bioinformatics. 2006 Oct 12;7:449. doi: 10.1186/1471-2105-7-449.

Identifying functional gene sets from hierarchically clustered expression data: map of abiotic stress regulated genes in Arabidopsis thaliana.

Nucleic Acids Res. 2006;34(18):e124. doi: 10.1093/nar/gkl694. Epub 2006 Sep 26.

Effects of replacing the unreliable cDNA microarray measurements on the disease classification based on gene expression profiles and functional modules.

Bioinformatics. 2006 Dec 1;22(23):2883-9. doi: 10.1093/bioinformatics/btl339. Epub 2006 Jun 29.

Prediction of missing values in microarray and use of mixed models to evaluate the predictors.

Stat Appl Genet Mol Biol. 2005;4:Article10. doi: 10.2202/1544-6115.1120. Epub 2005 May 5.

Microarray missing data imputation based on a set theoretic framework and biological knowledge.

Nucleic Acids Res. 2006 Mar 20;34(5):1608-19. doi: 10.1093/nar/gkl047. Print 2006.

Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme.

BMC Bioinformatics. 2006 Jan 22;7:32. doi: 10.1186/1471-2105-7-32.

Transcriptional response of steady-state yeast cultures to transient perturbations in carbon source.

Proc Natl Acad Sci U S A. 2006 Jan 10;103(2):389-94. doi: 10.1073/pnas.0509978103. Epub 2005 Dec 28.

Improving missing value estimation in microarray data with gene ontology.

Bioinformatics. 2006 Mar 1;22(5):566-72. doi: 10.1093/bioinformatics/btk019. Epub 2005 Dec 23.

How does gene expression clustering work?

Nat Biotechnol. 2005 Dec;23(12):1499-501. doi: 10.1038/nbt1205-1499.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

缺失值插补可改善基因表达微阵列数据的聚类和解读。

Missing value imputation improves clustering and interpretation of gene expression microarray data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献