用于基因表达时间序列中缺失值估计的双程插补算法。

Two-pass imputation algorithm for missing value estimation in gene expression time series.

作者信息

Tsiporkova Elena, Boeva Veselka

机构信息

Department of Molecular Genetics, Ghent University, Technologiepark 927, 9052 Ghent, Belgium.

出版信息

J Bioinform Comput Biol. 2007 Oct;5(5):1005-22. doi: 10.1142/s0219720007003053.

DOI:10.1142/s0219720007003053

PMID:17933008

Abstract

Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.

摘要

基因表达微阵列实验经常生成存在多个缺失值的数据集。然而，大多数用于基因表达数据的分析、挖掘和分类方法都需要一个完整的基因阵列值矩阵。因此，准确估计此类数据集中的缺失值已被视为一个重要问题，并且已经向生物界提出了几种插补算法。然而，这些方法中的大多数并不特别适用于时间序列表达谱。鉴于此，我们提出了一种新颖的插补算法，它特别适用于估计基因表达时间序列数据中的缺失值。该算法利用动态时间规整（DTW）距离来测量时间表达谱之间的相似性，随后为每个具有缺失值的基因表达谱选择一组专用的候选谱进行估计。已考虑三种不同的基于DTW的插补（DTWimpute）算法：逐位置插补、邻域插补和两遍插补。这些算法最初用Perl编写了原型，并使用几种不同的参数设置在酵母表达时间序列数据上评估了它们的准确性。实验表明，两遍插补算法始终表现更优，特别是对于具有较高缺失条目的数据集，优于邻域插补算法和逐位置插补算法。两遍DTWimpute算法的性能还与生物界广泛使用的加权K近邻算法进行了基准测试；结果表明前一种算法优于后一种算法。鉴于这些发现清楚地表明了DTW技术在时间序列数据缺失值估计方面的附加价值，我们构建了两遍DTWimpute算法的优化C++实现。该软件还提供了三种不同的初始粗略插补方法供选择。

相似文献

Two-pass imputation algorithm for missing value estimation in gene expression time series.

J Bioinform Comput Biol. 2007 Oct;5(5):1005-22. doi: 10.1142/s0219720007003053.

Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.

Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24.

Autoregressive-model-based missing value estimation for DNA microarray time series data.

IEEE Trans Inf Technol Biomed. 2009 Jan;13(1):131-7. doi: 10.1109/TITB.2008.2007421.

Sequential imputation for missing values.

Comput Biol Chem. 2007 Oct;31(5-6):320-7. doi: 10.1016/j.compbiolchem.2007.07.001. Epub 2007 Jul 10.

Missing value imputation for microarray data: a comprehensive comparison study and a web tool.

BMC Syst Biol. 2013;7 Suppl 6(Suppl 6):S12. doi: 10.1186/1752-0509-7-S6-S12. Epub 2013 Dec 13.

Missing value imputation for gene expression data by tailored nearest neighbors.

Stat Appl Genet Mol Biol. 2017 Apr 25;16(2):95-106. doi: 10.1515/sagmb-2015-0098.

MVIAeval: a web tool for comprehensively evaluating the performance of a new missing value imputation algorithm.

BMC Bioinformatics. 2017 Jan 13;18(1):31. doi: 10.1186/s12859-016-1429-3.

Missing value imputation for microRNA expression data by using a GO-based similarity measure.

BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):10. doi: 10.1186/s12859-015-0853-0.

A global learning with local preservation method for microarray data imputation.

Comput Biol Med. 2016 Oct 1;77:76-89. doi: 10.1016/j.compbiomed.2016.08.005. Epub 2016 Aug 5.

Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes.

BMC Bioinformatics. 2008 Jan 10;9:12. doi: 10.1186/1471-2105-9-12.

引用本文的文献

Subject-specific Estimation of Missing Cortical Thickness Maps in Developing Infant Brains.

Med Comput Vis (2015). 2016;9601:83-92. doi: 10.1007/978-3-319-42016-5_8. Epub 2016 Jul 30.

Learning-based subject-specific estimation of dynamic maps of cortical morphology at missing time points in longitudinal infant studies.

Hum Brain Mapp. 2016 Nov;37(11):4129-4147. doi: 10.1002/hbm.23301.

A formal concept analysis approach to consensus clustering of multi-experiment expression data.

BMC Bioinformatics. 2014 May 19;15:151. doi: 10.1186/1471-2105-15-151.

Clusters of temporal discordances reveal distinct embryonic patterning mechanisms in Drosophila and anopheles.

PLoS Biol. 2011 Jan 25;9(1):e1000584. doi: 10.1371/journal.pbio.1000584.

Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.

BMC Genomics. 2010 Jan 7;11:15. doi: 10.1186/1471-2164-11-15.

Time warping of evolutionary distant temporal gene expression data based on noise suppression.

BMC Bioinformatics. 2009 Oct 26;10:353. doi: 10.1186/1471-2105-10-353.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于基因表达时间序列中缺失值估计的双程插补算法。

Two-pass imputation algorithm for missing value estimation in gene expression time series.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献