Suppr超能文献

缺失值插补对基因表达谱下游分析的生物学影响。

Biological impact of missing-value imputation on downstream analyses of gene expression profiles.

机构信息

Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA.

出版信息

Bioinformatics. 2011 Jan 1;27(1):78-86. doi: 10.1093/bioinformatics/btq613. Epub 2010 Nov 2.

Abstract

MOTIVATION

Microarray experiments frequently produce multiple missing values (MVs) due to flaws such as dust, scratches, insufficient resolution or hybridization errors on the chips. Unfortunately, many downstream algorithms require a complete data matrix. The motivation of this work is to determine the impact of MV imputation on downstream analysis, and whether ranking of imputation methods by imputation accuracy correlates well with the biological impact of the imputation.

METHODS

Using eight datasets for differential expression (DE) and classification analysis and eight datasets for gene clustering, we demonstrate the biological impact of missing-value imputation on statistical downstream analyses, including three commonly employed DE methods, four classifiers and three gene-clustering methods. Correlation between the rankings of imputation methods based on three root-mean squared error (RMSE) measures and the rankings based on the downstream analysis methods was used to investigate which RMSE measure was most consistent with the biological impact measures, and which downstream analysis methods were the most sensitive to the choice of imputation procedure.

RESULTS

DE was the most sensitive to the choice of imputation procedure, while classification was the least sensitive and clustering was intermediate between the two. The logged RMSE (LRMSE) measure had the highest correlation with the imputation rankings based on the DE results, indicating that the LRMSE is the best representative surrogate among the three RMSE-based measures. Bayesian principal component analysis and least squares adaptive appeared to be the best performing methods in the empirical downstream evaluation.

摘要

动机

微阵列实验由于芯片上的灰尘、划痕、分辨率不足或杂交错误等缺陷,经常会产生多个缺失值(MVs)。不幸的是,许多下游算法都需要一个完整的数据矩阵。这项工作的动机是确定缺失值插补对下游分析的影响,以及插补准确性对插补方法的排名是否与插补的生物学影响很好地相关。

方法

使用八个用于差异表达(DE)和分类分析的数据集和八个用于基因聚类的数据集,我们展示了缺失值插补对统计下游分析的生物学影响,包括三种常用的 DE 方法、四种分类器和三种基因聚类方法。基于三个均方根误差(RMSE)度量的插补方法的排名与基于下游分析方法的排名之间的相关性用于研究哪种 RMSE 度量与生物学影响度量最一致,以及哪种下游分析方法对插补程序的选择最敏感。

结果

DE 对插补程序的选择最敏感,而分类最不敏感,聚类介于两者之间。对数 RMSE(LRMSE)度量与基于 DE 结果的插补排名相关性最高,表明 LRMSE 是三个 RMSE 度量中最好的代表替代物。贝叶斯主成分分析和最小二乘自适应在经验性下游评估中似乎表现最好。

相似文献

4
Gaussian mixture clustering and imputation of microarray data.微阵列数据的高斯混合聚类与插补
Bioinformatics. 2004 Apr 12;20(6):917-23. doi: 10.1093/bioinformatics/bth007. Epub 2004 Jan 29.
5
A multi-stage approach to clustering and imputation of gene expression profiles.一种用于基因表达谱聚类和插补的多阶段方法。
Bioinformatics. 2007 Apr 15;23(8):998-1005. doi: 10.1093/bioinformatics/btm053. Epub 2007 Feb 18.
6
A hybrid imputation approach for microarray missing value estimation.一种用于微阵列缺失值估计的混合插补方法。
BMC Genomics. 2015;16 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2164-16-S9-S1. Epub 2015 Aug 17.
7
DNA microarray data imputation and significance analysis of differential expression.DNA微阵列数据插补与差异表达的显著性分析
Bioinformatics. 2005 Nov 15;21(22):4155-61. doi: 10.1093/bioinformatics/bti638. Epub 2005 Aug 23.

引用本文的文献

本文引用的文献

1
Over-optimism in bioinformatics: an illustration.生物信息学中的过度乐观:一个例证。
Bioinformatics. 2010 Aug 15;26(16):1990-8. doi: 10.1093/bioinformatics/btq323. Epub 2010 Jun 26.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验