Wren Jonathan D
Arthritis and Immunology Research Program, Oklahoma Medical Research Foundation;, 825 N.E. 13th Street, Oklahoma City, OK 73104-5005, USA.
Bioinformatics. 2009 Jul 1;25(13):1694-701. doi: 10.1093/bioinformatics/btp290. Epub 2009 May 15.
Approximately 9334 (37%) of human genes have no publications documenting their function and, for those that are published, the number of publications per gene is highly skewed. Furthermore, for reasons not clear, the entry of new gene names into the literature has slowed in recent years. If we are to better understand human/mammalian biology and complete the catalog of human gene function, it is important to finish predicting putative functions for these genes based upon existing experimental evidence.
A global meta-analysis (GMA) of all publicly available GEO two-channel human microarray datasets (3551 experiments total) was conducted to identify genes with recurrent, reproducible patterns of co-regulation across different conditions. Patterns of co-expression were divided into parallel (i.e. genes are up and down-regulated together) and anti-parallel. Several ranking methods to predict a gene's function based on its top 20 co-expressed gene pairs were compared. In the best method, 34% of predicted Gene Ontology (GO) categories matched exactly with the known GO categories for approximately 5000 genes analyzed versus only 3% for random gene sets. Only 2.4% of co-expressed gene pairs were found as co-occurring gene pairs in MEDLINE.
Via a GO enrichment analysis, genes co-expressed in parallel with the query gene were frequently associated with the same GO categories, whereas anti-parallel genes were not. Combining parallel and anti-parallel genes for analysis resulted in fewer significant GO categories, suggesting they are best analyzed separately. Expression databases contain much unexpected genetic knowledge that has not yet been reported in the literature. A total of 1642 Human genes with unknown function were differentially expressed in at least 30 experiments.
Data matrix available upon request.
大约9334个(37%)人类基因没有关于其功能的文献记载,而且对于那些已发表的基因,每个基因的文献数量分布严重不均。此外,由于不明原因,近年来新基因名称在文献中的出现速度有所放缓。如果我们要更好地理解人类/哺乳动物生物学并完成人类基因功能目录,那么基于现有实验证据完成对这些基因推定功能的预测就很重要。
对所有公开可用的GEO双通道人类微阵列数据集(总共3551个实验)进行了一项全球荟萃分析(GMA),以识别在不同条件下具有反复出现、可重复的共调控模式的基因。共表达模式分为平行(即基因一起上调和下调)和反平行。比较了几种基于基因的前20个共表达基因对来预测基因功能的排序方法。在最佳方法中,对于分析的约5000个基因,预测的基因本体论(GO)类别中有34%与已知的GO类别完全匹配,而随机基因集的这一比例仅为3%。在MEDLINE中,只有2.4%的共表达基因对被发现是共现基因对。
通过GO富集分析,与查询基因平行共表达的基因通常与相同的GO类别相关,而反平行基因则不然。将平行和反平行基因结合起来分析会导致显著的GO类别减少,这表明它们最好分开分析。表达数据库包含许多尚未在文献中报道的意外遗传知识。共有1642个功能未知的人类基因在至少30个实验中差异表达。
可根据要求提供数据矩阵。