Oti Martin, van Reeuwijk Jeroen, Huynen Martijn A, Brunner Han G
Centre for Molecular and Biomolecular Informatics, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Geert Grooteplein 26-28, 6525 GA, Nijmegen, The Netherlands.
BMC Bioinformatics. 2008 Apr 23;9:208. doi: 10.1186/1471-2105-9-208.
Genes that are co-expressed tend to be involved in the same biological process. However, co-expression is not a very reliable predictor of functional links between genes. The evolutionary conservation of co-expression between species can be used to predict protein function more reliably than co-expression in a single species. Here we examine whether co-expression across multiple species is also a better prioritizer of disease genes than is co-expression between human genes alone.
We use co-expression data from yeast (S. cerevisiae), nematode worm (C. elegans), fruit fly (D. melanogaster), mouse and human and find that the use of evolutionary conservation can indeed improve the predictive value of co-expression. The effect that genes causing the same disease have higher co-expression than do other genes from their associated disease loci, is significantly enhanced when co-expression data are combined across evolutionarily distant species. We also find that performance can vary significantly depending on the co-expression datasets used, and just using more data does not necessarily lead to better prioritization. Instead, we find that dataset quality is more important than quantity, and using a consistent microarray platform per species leads to better performance than using more inclusive datasets pooled from various platforms.
We find that evolutionarily conserved gene co-expression prioritizes disease candidate genes better than human gene co-expression alone, and provide the integrated data as a new resource for disease gene prioritization tools.
共表达的基因往往参与相同的生物学过程。然而,共表达并非基因间功能联系的可靠预测指标。物种间共表达的进化保守性比单一物种内的共表达更能可靠地预测蛋白质功能。在此,我们探究跨多个物种的共表达是否也比仅人类基因间的共表达更能优先筛选出疾病基因。
我们使用了来自酵母(酿酒酵母)、线虫(秀丽隐杆线虫)、果蝇(黑腹果蝇)、小鼠和人类的共表达数据,发现利用进化保守性确实可以提高共表达的预测价值。当跨进化距离较远的物种合并共表达数据时,导致相同疾病的基因比其相关疾病位点的其他基因具有更高共表达的效应会显著增强。我们还发现,性能会因所使用的共表达数据集而有显著差异,仅仅使用更多数据并不一定能带来更好的优先排序。相反,我们发现数据集质量比数量更重要,每个物种使用一致的微阵列平台比使用从各种平台汇总的更具包容性的数据集性能更好。
我们发现进化保守的基因共表达比仅人类基因共表达更能优先筛选出疾病候选基因,并提供整合数据作为疾病基因优先排序工具的新资源。