Suppr超能文献

一项用于预测未知基因功能并评估文献数据差异的微阵列表达数据的全球荟萃分析。

A global meta-analysis of microarray expression data to predict unknown gene functions and estimate the literature-data divide.

作者信息

Wren Jonathan D

机构信息

Arthritis and Immunology Research Program, Oklahoma Medical Research Foundation;, 825 N.E. 13th Street, Oklahoma City, OK 73104-5005, USA.

出版信息

Bioinformatics. 2009 Jul 1;25(13):1694-701. doi: 10.1093/bioinformatics/btp290. Epub 2009 May 15.

Abstract

MOTIVATION

Approximately 9334 (37%) of human genes have no publications documenting their function and, for those that are published, the number of publications per gene is highly skewed. Furthermore, for reasons not clear, the entry of new gene names into the literature has slowed in recent years. If we are to better understand human/mammalian biology and complete the catalog of human gene function, it is important to finish predicting putative functions for these genes based upon existing experimental evidence.

RESULTS

A global meta-analysis (GMA) of all publicly available GEO two-channel human microarray datasets (3551 experiments total) was conducted to identify genes with recurrent, reproducible patterns of co-regulation across different conditions. Patterns of co-expression were divided into parallel (i.e. genes are up and down-regulated together) and anti-parallel. Several ranking methods to predict a gene's function based on its top 20 co-expressed gene pairs were compared. In the best method, 34% of predicted Gene Ontology (GO) categories matched exactly with the known GO categories for approximately 5000 genes analyzed versus only 3% for random gene sets. Only 2.4% of co-expressed gene pairs were found as co-occurring gene pairs in MEDLINE.

CONCLUSIONS

Via a GO enrichment analysis, genes co-expressed in parallel with the query gene were frequently associated with the same GO categories, whereas anti-parallel genes were not. Combining parallel and anti-parallel genes for analysis resulted in fewer significant GO categories, suggesting they are best analyzed separately. Expression databases contain much unexpected genetic knowledge that has not yet been reported in the literature. A total of 1642 Human genes with unknown function were differentially expressed in at least 30 experiments.

AVAILABILITY

Data matrix available upon request.

摘要

动机

大约9334个(37%)人类基因没有关于其功能的文献记载,而且对于那些已发表的基因,每个基因的文献数量分布严重不均。此外,由于不明原因,近年来新基因名称在文献中的出现速度有所放缓。如果我们要更好地理解人类/哺乳动物生物学并完成人类基因功能目录,那么基于现有实验证据完成对这些基因推定功能的预测就很重要。

结果

对所有公开可用的GEO双通道人类微阵列数据集(总共3551个实验)进行了一项全球荟萃分析(GMA),以识别在不同条件下具有反复出现、可重复的共调控模式的基因。共表达模式分为平行(即基因一起上调和下调)和反平行。比较了几种基于基因的前20个共表达基因对来预测基因功能的排序方法。在最佳方法中,对于分析的约5000个基因,预测的基因本体论(GO)类别中有34%与已知的GO类别完全匹配,而随机基因集的这一比例仅为3%。在MEDLINE中,只有2.4%的共表达基因对被发现是共现基因对。

结论

通过GO富集分析,与查询基因平行共表达的基因通常与相同的GO类别相关,而反平行基因则不然。将平行和反平行基因结合起来分析会导致显著的GO类别减少,这表明它们最好分开分析。表达数据库包含许多尚未在文献中报道的意外遗传知识。共有1642个功能未知的人类基因在至少30个实验中差异表达。

可用性

可根据要求提供数据矩阵。

相似文献

引用本文的文献

本文引用的文献

9
Comparing microarray studies.比较微阵列研究。
Methods Mol Biol. 2007;377:139-52. doi: 10.1007/978-1-59745-390-5_8.
10
Meta-analysis of gene expression data: a predictor-based approach.基因表达数据的荟萃分析:一种基于预测因子的方法。
Bioinformatics. 2007 Jul 1;23(13):1599-606. doi: 10.1093/bioinformatics/btm149. Epub 2007 Apr 26.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验