Oellrich Anika, Smedley Damian
Mouse Informatics Group, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
Database (Oxford). 2014 Mar 13;2014:bau017. doi: 10.1093/database/bau017. Print 2014.
Despite great biological and computational efforts to determine the genetic causes underlying human heritable diseases, approximately half (3500) of these diseases are still without an identified genetic cause. Model organism studies allow the targeted modification of the genome and can help with the identification of genetic causes for human diseases. Targeted modifications have led to a vast amount of model organism data. However, these data are scattered across different databases, preventing an integrated view and missing out on contextual information. Once we are able to combine all the existing resources, will we be able to fully understand the causes underlying a disease and how species differ. Here, we present an integrated data resource combining tissue expression with phenotypes in mouse lines and bringing us one step closer to consequence chains from a molecular level to a resulting phenotype. Mutations in genes often manifest in phenotypes in the same tissue that the gene is expressed in. However, in other cases, a systems level approach is required to understand how perturbations to gene-networks connecting multiple tissues lead to a phenotype. Automated evaluation of the predicted tissue-phenotype associations reveals that 72-76% of the phenotypes are associated with disruption of genes expressed in the affected tissue. However, 55-64% of the individual phenotype-tissue associations show spatially separated gene expression and phenotype manifestation. For example, we see a correlation between 'total body fat' abnormalities and genes expressed in the 'brain', which fits recent discoveries linking genes expressed in the hypothalamus to obesity. Finally, we demonstrate that the use of our predicted tissue-phenotype associations can improve the detection of a known disease-gene association when combined with a disease gene candidate prediction tool. For example, JAK2, the known gene associated with Familial Erythrocytosis 1, rises from the seventh best candidate to the top hit when the associated tissues are taken into consideration. Database URL: http://www.sanger.ac.uk/resources/databases/phenodigm/phenotype/list.
尽管在确定人类遗传性疾病潜在遗传病因方面付出了巨大的生物学和计算方面的努力,但这些疾病中仍有约一半(3500种)尚未找到明确的遗传病因。模式生物研究允许对基因组进行靶向修饰,有助于确定人类疾病的遗传病因。靶向修饰产生了大量的模式生物数据。然而,这些数据分散在不同的数据库中,妨碍了综合视角的形成,也遗漏了背景信息。一旦我们能够整合所有现有资源,我们就能充分理解疾病的潜在病因以及物种之间的差异。在此,我们展示了一种整合数据资源,它将小鼠品系中的组织表达与表型相结合,使我们离从分子水平到最终表型的因果链更近了一步。基因中的突变通常在该基因所表达的同一组织的表型中表现出来。然而,在其他情况下,需要采用系统层面的方法来理解连接多个组织的基因网络的扰动如何导致一种表型。对预测的组织 - 表型关联进行自动评估发现,72 - 76%的表型与受影响组织中表达的基因的破坏有关。然而,55 - 64%的个体表型 - 组织关联显示基因表达和表型表现存在空间分离。例如,我们看到“全身脂肪”异常与“大脑”中表达的基因之间存在相关性,这与最近将下丘脑表达的基因与肥胖联系起来的发现相符。最后,我们证明,当与疾病基因候选预测工具结合使用时,我们预测的组织 - 表型关联可改善对已知疾病 - 基因关联的检测。例如,与家族性红细胞增多症1相关的已知基因JAK2,在考虑相关组织时,从第七最佳候选基因升至榜首。数据库网址:http://www.sanger.ac.uk/resources/databases/phenodigm/phenotype/list