利用基因本体论改进微阵列数据中的缺失值估计

Improving missing value estimation in microarray data with gene ontology.

作者信息

Tuikkala Johannes, Elo Laura, Nevalainen Olli S, Aittokallio Tero

机构信息

Department of Information Technology, University of Turku, Lemminkäisenkatu 14A, FIN-20520, Finland.

出版信息

Bioinformatics. 2006 Mar 1;22(5):566-72. doi: 10.1093/bioinformatics/btk019. Epub 2005 Dec 23.

DOI:10.1093/bioinformatics/btk019

PMID:16377613

Abstract

MOTIVATION

Gene expression microarray experiments produce datasets with frequent missing expression values. Accurate estimation of missing values is an important prerequisite for efficient data analysis as many statistical and machine learning techniques either require a complete dataset or their results are significantly dependent on the quality of such estimates. A limitation of the existing estimation methods for microarray data is that they use no external information but the estimation is based solely on the expression data. We hypothesized that utilizing a priori information on functional similarities available from public databases facilitates the missing value estimation.

RESULTS

We investigated whether semantic similarity originating from gene ontology (GO) annotations could improve the selection of relevant genes for missing value estimation. The relative contribution of each information source was automatically estimated from the data using an adaptive weight selection procedure. Our experimental results in yeast cDNA microarray datasets indicated that by considering GO information in the k-nearest neighbor algorithm we can enhance its performance considerably, especially when the number of experimental conditions is small and the percentage of missing values is high. The increase of performance was less evident with a more sophisticated estimation method. We conclude that even a small proportion of annotated genes can provide improvements in data quality significant for the eventual interpretation of the microarray experiments.

AVAILABILITY

Java and Matlab codes are available on request from the authors.

SUPPLEMENTARY MATERIAL

Available online at http://users.utu.fi/jotatu/GOImpute.html.

摘要

动机

基因表达微阵列实验产生的数据集经常存在缺失的表达值。准确估计缺失值是高效数据分析的重要前提，因为许多统计和机器学习技术要么需要完整的数据集，要么其结果在很大程度上依赖于此类估计的质量。现有微阵列数据估计方法的一个局限性在于，它们不使用外部信息，估计仅基于表达数据。我们假设利用公共数据库中可用的功能相似性先验信息有助于缺失值估计。

结果

我们研究了源自基因本体（GO）注释的语义相似性是否能改进用于缺失值估计的相关基因选择。使用自适应权重选择程序从数据中自动估计每个信息源的相对贡献。我们在酵母cDNA微阵列数据集上的实验结果表明，在k近邻算法中考虑GO信息可以显著提高其性能，尤其是当实验条件数量较少且缺失值百分比很高时。对于更复杂的估计方法，性能提升不太明显。我们得出结论，即使是一小部分带注释的基因也能显著提高数据质量，这对于微阵列实验的最终解读很重要。

可用性

可根据作者要求获取Java和Matlab代码。

补充材料

可在http://users.utu.fi/jotatu/GOImpute.html在线获取。

相似文献

Improving missing value estimation in microarray data with gene ontology.

Bioinformatics. 2006 Mar 1;22(5):566-72. doi: 10.1093/bioinformatics/btk019. Epub 2005 Dec 23.

Integrative missing value estimation for microarray data.

BMC Bioinformatics. 2006 Oct 12;7:449. doi: 10.1186/1471-2105-7-449.

Iterated local least squares microarray missing value imputation.

J Bioinform Comput Biol. 2006 Oct;4(5):935-57. doi: 10.1142/s0219720006002302.

Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.

Bioinformatics. 2005 May 15;21(10):2417-23. doi: 10.1093/bioinformatics/bti345. Epub 2005 Feb 24.

Missing value estimation for DNA microarray gene expression data: local least squares imputation.

Bioinformatics. 2005 Jan 15;21(2):187-98. doi: 10.1093/bioinformatics/bth499. Epub 2004 Aug 27.

Robust imputation method for missing values in microarray data.

BMC Bioinformatics. 2007 May 3;8 Suppl 2(Suppl 2):S6. doi: 10.1186/1471-2105-8-S2-S6.

Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures.

BMC Bioinformatics. 2007 May 18;8:157. doi: 10.1186/1471-2105-8-157.

Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme.

BMC Bioinformatics. 2006 Jan 22;7:32. doi: 10.1186/1471-2105-7-32.

DNA microarray data imputation and significance analysis of differential expression.

Bioinformatics. 2005 Nov 15;21(22):4155-61. doi: 10.1093/bioinformatics/bti638. Epub 2005 Aug 23.

pcaMethods--a bioconductor package providing PCA methods for incomplete data.

Bioinformatics. 2007 May 1;23(9):1164-7. doi: 10.1093/bioinformatics/btm069. Epub 2007 Mar 7.

引用本文的文献

Tutorial on survival modeling with applications to omics data.

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae132.

A comprehensive survey on computational learning methods for analysis of gene expression data.

Front Mol Biosci. 2022 Nov 7;9:907150. doi: 10.3389/fmolb.2022.907150. eCollection 2022.

Efficient technique of microarray missing data imputation using clustering and weighted nearest neighbour.

Sci Rep. 2021 Dec 21;11(1):24297. doi: 10.1038/s41598-021-03438-x.

A Review of Integrative Imputation for Multi-Omics Datasets.

Front Genet. 2020 Oct 15;11:570255. doi: 10.3389/fgene.2020.570255. eCollection 2020.

Imputation of Gene Expression Data in Blood Cancer and Its Significance in Inferring Biological Pathways.

Front Oncol. 2020 Jan 8;9:1442. doi: 10.3389/fonc.2019.01442. eCollection 2019.

DrImpute: imputing dropout events in single cell RNA sequencing data.

BMC Bioinformatics. 2018 Jun 8;19(1):220. doi: 10.1186/s12859-018-2226-y.

A comprehensive evaluation of popular proteomics software workflows for label-free proteome quantification and imputation.

Brief Bioinform. 2018 Nov 27;19(6):1344-1355. doi: 10.1093/bib/bbx054.

An improved method for functional similarity analysis of genes based on Gene Ontology.

BMC Syst Biol. 2016 Dec 23;10(Suppl 4):119. doi: 10.1186/s12918-016-0359-z.

MVIAeval: a web tool for comprehensively evaluating the performance of a new missing value imputation algorithm.

BMC Bioinformatics. 2017 Jan 13;18(1):31. doi: 10.1186/s12859-016-1429-3.

An integrative imputation method based on multi-omics datasets.

BMC Bioinformatics. 2016 Jun 21;17:247. doi: 10.1186/s12859-016-1122-6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用基因本体论改进微阵列数据中的缺失值估计

Improving missing value estimation in microarray data with gene ontology.

作者信息

Tuikkala Johannes, Elo Laura, Nevalainen Olli S, Aittokallio Tero

机构信息

Department of Information Technology, University of Turku, Lemminkäisenkatu 14A, FIN-20520, Finland.

出版信息

Bioinformatics. 2006 Mar 1;22(5):566-72. doi: 10.1093/bioinformatics/btk019. Epub 2005 Dec 23.

DOI:10.1093/bioinformatics/btk019

PMID:16377613

Abstract

MOTIVATION

RESULTS

AVAILABILITY

Java and Matlab codes are available on request from the authors.

SUPPLEMENTARY MATERIAL

Available online at http://users.utu.fi/jotatu/GOImpute.html.

摘要

动机

结果

可用性

可根据作者要求获取Java和Matlab代码。

补充材料

可在http://users.utu.fi/jotatu/GOImpute.html在线获取。

利用基因本体论改进微阵列数据中的缺失值估计

Improving missing value estimation in microarray data with gene ontology.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

SUPPLEMENTARY MATERIAL

动机

结果

可用性

补充材料

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

利用基因本体论改进微阵列数据中的缺失值估计

Improving missing value estimation in microarray data with gene ontology.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

SUPPLEMENTARY MATERIAL

动机

结果

可用性

补充材料

相似文献

引用本文的文献