Suppr超能文献

并行缺失值插补:一种用于微阵列数据的新型稳健缺失值估计算法。

Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.

作者信息

Sehgal Muhammad Shoaib B, Gondal Iqbal, Dooley Laurence S

机构信息

Gippsland School of Computing and Information Technology, Monash University, VIC 3842, Australia.

出版信息

Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24.

Abstract

MOTIVATION

Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods.

RESULTS

The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE algorithm.

AVAILABILITY

The CMVE software is available upon request from the authors.

摘要

动机

微阵列数据在生物学的一系列应用领域中都有使用,尽管其通常包含大量缺失值。这些缺失值会显著影响后续的统计分析和机器学习算法,因此在使用这些算法之前,有强烈的动机尽可能准确地估计这些值。虽然已经提出了许多插补算法,但仍需要开发更强大的技术,以便能够准确地对生物数据进行进一步分析。本文提出了一种创新的缺失值插补算法,称为并行缺失值估计(CMVE),该算法使用多个基于协方差的插补矩阵来最终预测缺失值。这些矩阵通过最小二乘回归和线性规划方法进行计算和优化。

结果

新的CMVE算法已与现有的估计技术进行了比较,包括贝叶斯主成分分析插补(BPCA)、最小二乘插补(LSImpute)和K近邻(KNN)。所有这些方法都经过了严格测试,以估计三个单独的非时间序列(基于卵巢癌)和一个时间序列(酵母孢子形成)数据集中的缺失值。每种方法都使用归一化均方根(NRMS)误差度量进行了定量分析,涵盖了从0.01到0.2的广泛随机引入的缺失值概率范围。还对包含1.7%实际缺失值的酵母数据集进行了实验,以检验CMVE不仅在随机出现的缺失值情况下,而且在实际缺失值分布情况下表现更好的假设。结果证实,在相同的计算复杂度下,对于两种类型的数据序列,CMVE与其他方法相比,始终表现出卓越且稳健的缺失值估计能力。还制定了一个简洁的理论框架来验证CMVE算法的改进性能。

可用性

可向作者索取CMVE软件。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验