• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

微阵列数据的缺失值插补:一项综合比较研究及网络工具

Missing value imputation for microarray data: a comprehensive comparison study and a web tool.

作者信息

Chiu Chia-Chun, Chan Shih-Yao, Wang Chung-Ching, Wu Wei-Sheng

出版信息

BMC Syst Biol. 2013;7 Suppl 6(Suppl 6):S12. doi: 10.1186/1752-0509-7-S6-S12. Epub 2013 Dec 13.

DOI:10.1186/1752-0509-7-S6-S12
PMID:24565220
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4028811/
Abstract

BACKGROUND

Microarray data are usually peppered with missing values due to various reasons. However, most of the downstream analyses for microarray data require complete datasets. Therefore, accurate algorithms for missing value estimation are needed for improving the performance of microarray data analyses. Although many algorithms have been developed, there are many debates on the selection of the optimal algorithm. The studies about the performance comparison of different algorithms are still incomprehensive, especially in the number of benchmark datasets used, the number of algorithms compared, the rounds of simulation conducted, and the performance measures used.

RESULTS

In this paper, we performed a comprehensive comparison by using (I) thirteen datasets, (II) nine algorithms, (III) 110 independent runs of simulation, and (IV) three types of measures to evaluate the performance of each imputation algorithm fairly. First, the effects of different types of microarray datasets on the performance of each imputation algorithm were evaluated. Second, we discussed whether the datasets from different species have different impact on the performance of different algorithms. To assess the performance of each algorithm fairly, all evaluations were performed using three types of measures. Our results indicate that the performance of an imputation algorithm mainly depends on the type of a dataset but not on the species where the samples come from. In addition to the statistical measure, two other measures with biological meanings are useful to reflect the impact of missing value imputation on the downstream data analyses. Our study suggests that local-least-squares-based methods are good choices to handle missing values for most of the microarray datasets.

CONCLUSIONS

In this work, we carried out a comprehensive comparison of the algorithms for microarray missing value imputation. Based on such a comprehensive comparison, researchers could choose the optimal algorithm for their datasets easily. Moreover, new imputation algorithms could be compared with the existing algorithms using this comparison strategy as a standard protocol. In addition, to assist researchers in dealing with missing values easily, we built a web-based and easy-to-use imputation tool, MissVIA (http://cosbi.ee.ncku.edu.tw/MissVIA), which supports many imputation algorithms. Once users upload a real microarray dataset and choose the imputation algorithms, MissVIA will determine the optimal algorithm for the users' data through a series of simulations, and then the imputed results can be downloaded for the downstream data analyses.

摘要

背景

由于各种原因,微阵列数据通常充斥着缺失值。然而,大多数微阵列数据的下游分析需要完整的数据集。因此,需要准确的缺失值估计算法来提高微阵列数据分析的性能。尽管已经开发了许多算法,但在选择最优算法方面仍存在许多争议。关于不同算法性能比较的研究仍然不全面,特别是在使用的基准数据集数量、比较的算法数量、进行的模拟轮数以及使用的性能度量方面。

结果

在本文中,我们通过使用(I)13个数据集、(II)9种算法、(III)110次独立模拟运行以及(IV)三种类型的度量进行了全面比较,以公平地评估每种插补算法的性能。首先,评估了不同类型的微阵列数据集对每种插补算法性能的影响。其次,我们讨论了来自不同物种的数据集对不同算法性能是否有不同影响。为了公平地评估每种算法的性能,所有评估均使用三种类型的度量进行。我们的结果表明,插补算法的性能主要取决于数据集的类型,而不是样本所来自的物种。除了统计度量外,另外两种具有生物学意义的度量对于反映缺失值插补对下游数据分析的影响很有用。我们的研究表明,基于局部最小二乘法的方法是处理大多数微阵列数据集缺失值的不错选择。

结论

在这项工作中,我们对微阵列缺失值插补算法进行了全面比较。基于这样的全面比较,研究人员可以轻松地为其数据集选择最优算法。此外,新的插补算法可以使用此比较策略作为标准协议与现有算法进行比较。此外,为了帮助研究人员轻松处理缺失值,我们构建了一个基于网络且易于使用的插补工具MissVIA(http://cosbi.ee.ncku.edu.tw/MissVIA),它支持多种插补算法。一旦用户上传真实的微阵列数据集并选择插补算法,MissVIA将通过一系列模拟为用户的数据确定最优算法,然后可以下载插补结果用于下游数据分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/f6eee026b600/1752-0509-7-S6-S12-13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/784c12db28d2/1752-0509-7-S6-S12-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/4dc59f58a9fe/1752-0509-7-S6-S12-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/4a2194414e2b/1752-0509-7-S6-S12-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/c4f014db370b/1752-0509-7-S6-S12-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/3f217400fa98/1752-0509-7-S6-S12-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/e4a50758480f/1752-0509-7-S6-S12-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/16a823945169/1752-0509-7-S6-S12-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/befa4182270b/1752-0509-7-S6-S12-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/92f9614bea7a/1752-0509-7-S6-S12-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/24553e6a6ce9/1752-0509-7-S6-S12-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/e96d85ca968b/1752-0509-7-S6-S12-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/e9c70e690482/1752-0509-7-S6-S12-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/f6eee026b600/1752-0509-7-S6-S12-13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/784c12db28d2/1752-0509-7-S6-S12-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/4dc59f58a9fe/1752-0509-7-S6-S12-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/4a2194414e2b/1752-0509-7-S6-S12-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/c4f014db370b/1752-0509-7-S6-S12-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/3f217400fa98/1752-0509-7-S6-S12-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/e4a50758480f/1752-0509-7-S6-S12-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/16a823945169/1752-0509-7-S6-S12-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/befa4182270b/1752-0509-7-S6-S12-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/92f9614bea7a/1752-0509-7-S6-S12-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/24553e6a6ce9/1752-0509-7-S6-S12-10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/e96d85ca968b/1752-0509-7-S6-S12-11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/e9c70e690482/1752-0509-7-S6-S12-12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b220/4028811/f6eee026b600/1752-0509-7-S6-S12-13.jpg

相似文献

1
Missing value imputation for microarray data: a comprehensive comparison study and a web tool.微阵列数据的缺失值插补:一项综合比较研究及网络工具
BMC Syst Biol. 2013;7 Suppl 6(Suppl 6):S12. doi: 10.1186/1752-0509-7-S6-S12. Epub 2013 Dec 13.
2
MVIAeval: a web tool for comprehensively evaluating the performance of a new missing value imputation algorithm.MVIAeval:一个用于全面评估新的缺失值插补算法性能的网络工具。
BMC Bioinformatics. 2017 Jan 13;18(1):31. doi: 10.1186/s12859-016-1429-3.
3
Shrinkage regression-based methods for microarray missing value imputation.基于收缩回归的微阵列缺失值插补方法。
BMC Syst Biol. 2013;7 Suppl 6(Suppl 6):S11. doi: 10.1186/1752-0509-7-S6-S11. Epub 2013 Dec 13.
4
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.并行缺失值插补:一种用于微阵列数据的新型稳健缺失值估计算法。
Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24.
5
Missing value imputation for microRNA expression data by using a GO-based similarity measure.基于基因本体(GO)相似性度量的微小RNA表达数据缺失值插补
BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):10. doi: 10.1186/s12859-015-0853-0.
6
Two-pass imputation algorithm for missing value estimation in gene expression time series.用于基因表达时间序列中缺失值估计的双程插补算法。
J Bioinform Comput Biol. 2007 Oct;5(5):1005-22. doi: 10.1142/s0219720007003053.
7
Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.比较缺失值插补方法以提高微阵列实验的聚类和解释。
BMC Genomics. 2010 Jan 7;11:15. doi: 10.1186/1471-2164-11-15.
8
A hybrid imputation approach for microarray missing value estimation.一种用于微阵列缺失值估计的混合插补方法。
BMC Genomics. 2015;16 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2164-16-S9-S1. Epub 2015 Aug 17.
9
Missing value imputation in high-dimensional phenomic data: imputable or not, and how?高维表型组数据中的缺失值插补:是否可插补以及如何插补?
BMC Bioinformatics. 2014 Nov 5;15(1):346. doi: 10.1186/s12859-014-0346-6.
10
A global learning with local preservation method for microarray data imputation.一种用于微阵列数据插补的全局学习与局部保留方法。
Comput Biol Med. 2016 Oct 1;77:76-89. doi: 10.1016/j.compbiomed.2016.08.005. Epub 2016 Aug 5.

引用本文的文献

1
Comprehensive Evaluation of Advanced Imputation Methods for Proteomic Data Acquired via the Label-Free Approach.通过无标记方法获取的蛋白质组学数据的先进插补方法综合评估
Int J Mol Sci. 2024 Dec 17;25(24):13491. doi: 10.3390/ijms252413491.
2
The impact of selective HDAC inhibitors on the transcriptome of early mouse embryos.选择性组蛋白去乙酰化酶抑制剂对早期小鼠胚胎转录组的影响。
BMC Genomics. 2024 Feb 5;25(1):143. doi: 10.1186/s12864-024-10029-3.
3
The Performance Evaluation of The Random Forest Algorithm for A Gene Selection in Identifying Genes Associated with Resectable Pancreatic Cancer in Microarray Dataset: A Retrospective Study.

本文引用的文献

1
Missing-Values Imputation Algorithms for Microarray Gene Expression Data.用于微阵列基因表达数据的缺失值插补算法
Methods Mol Biol. 2019;1986:255-266. doi: 10.1007/978-1-4939-9442-7_12.
2
Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets.使用MAQC数据集比较Affymetrix基因表达数据集的插补程序
Adv Bioinformatics. 2013;2013:790567. doi: 10.1155/2013/790567. Epub 2013 Oct 9.
3
Yeast cell cycle transcription factors identification by variable selection criteria.通过可变选择标准鉴定酵母细胞周期转录因子。
用于在微阵列数据集中识别可切除胰腺癌相关基因的基因选择的随机森林算法性能评估:一项回顾性研究
Cell J. 2023 May 28;25(5):347-353. doi: 10.22074/cellj.2023.1971852.1156.
4
Attention-Based Sequence-to-Sequence Model for Time Series Imputation.用于时间序列插补的基于注意力机制的序列到序列模型。
Entropy (Basel). 2022 Dec 9;24(12):1798. doi: 10.3390/e24121798.
5
Comparison of imputation and imputation-free methods for statistical analysis of mass spectrometry data with missing data.比较缺失数据下质谱数据分析的插补和非插补方法。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab353.
6
Bioinformatic Analysis of Temporal and Spatial Proteome Alternations During Infections.感染期间时空蛋白质组变化的生物信息学分析
Front Genet. 2021 Jul 2;12:667936. doi: 10.3389/fgene.2021.667936. eCollection 2021.
7
A comparative study of evaluating missing value imputation methods in label-free proteomics.基于无标记蛋白质组学的缺失值插补方法评估的比较研究。
Sci Rep. 2021 Jan 19;11(1):1760. doi: 10.1038/s41598-021-81279-4.
8
Classifying Incomplete Gene-Expression Data: Ensemble Learning with Non-Pre-Imputation Feature Filtering and Best-First Search Technique.基于非预填补特征过滤和最佳优先搜索技术的集成学习在不完全基因表达数据分类中的应用
Int J Mol Sci. 2018 Oct 30;19(11):3398. doi: 10.3390/ijms19113398.
9
Deep learning based low-cost high-accuracy diagnostic framework for dementia using comprehensive neuropsychological assessment profiles.基于深度学习的低成本高精度痴呆诊断框架,使用全面的神经心理学评估档案。
BMC Geriatr. 2018 Oct 3;18(1):234. doi: 10.1186/s12877-018-0915-z.
10
A causal mediation model of ischemia reperfusion injury in the retina.视网膜缺血再灌注损伤的因果中介模型。
PLoS One. 2017 Nov 9;12(11):e0187426. doi: 10.1371/journal.pone.0187426. eCollection 2017.
Gene. 2011 Oct 10;485(2):172-6. doi: 10.1016/j.gene.2011.06.001. Epub 2011 Jun 16.
4
Missing value imputation for gene expression data: computational techniques to recover missing data from available information.基因表达数据的缺失值填补:从现有信息中恢复缺失数据的计算技术。
Brief Bioinform. 2011 Sep;12(5):498-513. doi: 10.1093/bib/bbq080. Epub 2010 Dec 14.
5
Biological impact of missing-value imputation on downstream analyses of gene expression profiles.缺失值插补对基因表达谱下游分析的生物学影响。
Bioinformatics. 2011 Jan 1;27(1):78-86. doi: 10.1093/bioinformatics/btq613. Epub 2010 Nov 2.
6
Over-optimism in bioinformatics: an illustration.生物信息学中的过度乐观:一个例证。
Bioinformatics. 2010 Aug 15;26(16):1990-8. doi: 10.1093/bioinformatics/btq323. Epub 2010 Jun 26.
7
Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.比较缺失值插补方法以提高微阵列实验的聚类和解释。
BMC Genomics. 2010 Jan 7;11:15. doi: 10.1186/1471-2164-11-15.
8
Model-based deconvolution of cell cycle time-series data reveals gene expression details at high resolution.基于模型的细胞周期时间序列数据反卷积可在高分辨率下揭示基因表达细节。
PLoS Comput Biol. 2009 Aug;5(8):e1000460. doi: 10.1371/journal.pcbi.1000460. Epub 2009 Aug 14.
9
Reverse engineering module networks by PSO-RNN hybrid modeling.通过粒子群优化-递归神经网络混合建模对模块网络进行逆向工程。
BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):S15. doi: 10.1186/1471-2164-10-S1-S15.
10
How to improve postgenomic knowledge discovery using imputation.如何利用插补法改善后基因组知识发现。
EURASIP J Bioinform Syst Biol. 2009;2009(1):717136. doi: 10.1155/2009/717136. Epub 2009 Feb 8.