Suppr超能文献

缺失值插补对基因表达谱下游分析的生物学影响。

Biological impact of missing-value imputation on downstream analyses of gene expression profiles.

机构信息

Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA.

出版信息

Bioinformatics. 2011 Jan 1;27(1):78-86. doi: 10.1093/bioinformatics/btq613. Epub 2010 Nov 2.

Abstract

MOTIVATION

Microarray experiments frequently produce multiple missing values (MVs) due to flaws such as dust, scratches, insufficient resolution or hybridization errors on the chips. Unfortunately, many downstream algorithms require a complete data matrix. The motivation of this work is to determine the impact of MV imputation on downstream analysis, and whether ranking of imputation methods by imputation accuracy correlates well with the biological impact of the imputation.

METHODS

Using eight datasets for differential expression (DE) and classification analysis and eight datasets for gene clustering, we demonstrate the biological impact of missing-value imputation on statistical downstream analyses, including three commonly employed DE methods, four classifiers and three gene-clustering methods. Correlation between the rankings of imputation methods based on three root-mean squared error (RMSE) measures and the rankings based on the downstream analysis methods was used to investigate which RMSE measure was most consistent with the biological impact measures, and which downstream analysis methods were the most sensitive to the choice of imputation procedure.

RESULTS

DE was the most sensitive to the choice of imputation procedure, while classification was the least sensitive and clustering was intermediate between the two. The logged RMSE (LRMSE) measure had the highest correlation with the imputation rankings based on the DE results, indicating that the LRMSE is the best representative surrogate among the three RMSE-based measures. Bayesian principal component analysis and least squares adaptive appeared to be the best performing methods in the empirical downstream evaluation.

摘要

动机

微阵列实验由于芯片上的灰尘、划痕、分辨率不足或杂交错误等缺陷,经常会产生多个缺失值(MVs)。不幸的是,许多下游算法都需要一个完整的数据矩阵。这项工作的动机是确定缺失值插补对下游分析的影响,以及插补准确性对插补方法的排名是否与插补的生物学影响很好地相关。

方法

使用八个用于差异表达(DE)和分类分析的数据集和八个用于基因聚类的数据集,我们展示了缺失值插补对统计下游分析的生物学影响,包括三种常用的 DE 方法、四种分类器和三种基因聚类方法。基于三个均方根误差(RMSE)度量的插补方法的排名与基于下游分析方法的排名之间的相关性用于研究哪种 RMSE 度量与生物学影响度量最一致,以及哪种下游分析方法对插补程序的选择最敏感。

结果

DE 对插补程序的选择最敏感,而分类最不敏感,聚类介于两者之间。对数 RMSE(LRMSE)度量与基于 DE 结果的插补排名相关性最高,表明 LRMSE 是三个 RMSE 度量中最好的代表替代物。贝叶斯主成分分析和最小二乘自适应在经验性下游评估中似乎表现最好。

相似文献

1
Biological impact of missing-value imputation on downstream analyses of gene expression profiles.
Bioinformatics. 2011 Jan 1;27(1):78-86. doi: 10.1093/bioinformatics/btq613. Epub 2010 Nov 2.
2
Impact of missing data imputation methods on gene expression clustering and classification.
BMC Bioinformatics. 2015 Feb 26;16:64. doi: 10.1186/s12859-015-0494-3.
4
Gaussian mixture clustering and imputation of microarray data.
Bioinformatics. 2004 Apr 12;20(6):917-23. doi: 10.1093/bioinformatics/bth007. Epub 2004 Jan 29.
5
A multi-stage approach to clustering and imputation of gene expression profiles.
Bioinformatics. 2007 Apr 15;23(8):998-1005. doi: 10.1093/bioinformatics/btm053. Epub 2007 Feb 18.
6
A hybrid imputation approach for microarray missing value estimation.
BMC Genomics. 2015;16 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2164-16-S9-S1. Epub 2015 Aug 17.
7
DNA microarray data imputation and significance analysis of differential expression.
Bioinformatics. 2005 Nov 15;21(22):4155-61. doi: 10.1093/bioinformatics/bti638. Epub 2005 Aug 23.
8
Ameliorative missing value imputation for robust biological knowledge inference.
J Biomed Inform. 2008 Aug;41(4):499-514. doi: 10.1016/j.jbi.2007.10.005. Epub 2007 Dec 31.
9
A global learning with local preservation method for microarray data imputation.
Comput Biol Med. 2016 Oct 1;77:76-89. doi: 10.1016/j.compbiomed.2016.08.005. Epub 2016 Aug 5.
10
Missing value imputation improves clustering and interpretation of gene expression microarray data.
BMC Bioinformatics. 2008 Apr 18;9:202. doi: 10.1186/1471-2105-9-202.

引用本文的文献

1
A Hands-On Introduction to Data Analytics for Biomedical Research.
Function (Oxf). 2025 Mar 24;6(2). doi: 10.1093/function/zqaf015.
4
Censored Least Squares for Imputing Missing Values in PARAFAC Tensor Factorization.
bioRxiv. 2024 Jul 10:2024.07.05.602272. doi: 10.1101/2024.07.05.602272.
5
Machine learning integrative approaches to advance computational immunology.
Genome Med. 2024 Jun 11;16(1):80. doi: 10.1186/s13073-024-01350-3.
8
Bioinformatic Analysis of Temporal and Spatial Proteome Alternations During Infections.
Front Genet. 2021 Jul 2;12:667936. doi: 10.3389/fgene.2021.667936. eCollection 2021.
9
A flexible, interpretable, and accurate approach for imputing the expression of unmeasured genes.
Nucleic Acids Res. 2020 Dec 2;48(21):e125. doi: 10.1093/nar/gkaa881.

本文引用的文献

1
Over-optimism in bioinformatics: an illustration.
Bioinformatics. 2010 Aug 15;26(16):1990-8. doi: 10.1093/bioinformatics/btq323. Epub 2010 Jun 26.
3
Dealing with missing values in large-scale studies: microarray data imputation and beyond.
Brief Bioinform. 2010 Mar;11(2):253-64. doi: 10.1093/bib/bbp059. Epub 2009 Dec 4.
4
Apparently low reproducibility of true differential expression discoveries in microarray studies.
Bioinformatics. 2008 Sep 15;24(18):2057-63. doi: 10.1093/bioinformatics/btn365. Epub 2008 Jul 16.
5
Missing value imputation improves clustering and interpretation of gene expression microarray data.
BMC Bioinformatics. 2008 Apr 18;9:202. doi: 10.1186/1471-2105-9-202.
6
Ameliorative missing value imputation for robust biological knowledge inference.
J Biomed Inform. 2008 Aug;41(4):499-514. doi: 10.1016/j.jbi.2007.10.005. Epub 2007 Dec 31.
8
pcaMethods--a bioconductor package providing PCA methods for incomplete data.
Bioinformatics. 2007 May 1;23(9):1164-7. doi: 10.1093/bioinformatics/btm069. Epub 2007 Mar 7.
10
Linear models and empirical bayes methods for assessing differential expression in microarray experiments.
Stat Appl Genet Mol Biol. 2004;3:Article3. doi: 10.2202/1544-6115.1027. Epub 2004 Feb 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验