Department of Physiology Development and Neuroscience, University of Cambridge, Cambridge, UK.
Hum Mutat. 2012 May;33(5):813-6. doi: 10.1002/humu.22079. Epub 2012 Apr 6.
There is an increasing accumulation of data on disease-related mutations and associated phenotypes in a wide variety of databases worldwide. Exploiting these data in the context of whole genome sequencing is inhibited because the phenotype information in these databases is often difficult to search meaningfully or relate between data sets, and automated computational integration is not possible. Key to this integration is the development of ontology-based methods for describing diseases in terms of their component phenotypes. This would allow analysis of variation in disease manifestation, relationships between diseases and phenotypes in model organisms, and linking diseases to gene mutations, pathways, and phenotypes. Building a systematic link to phenotypes manifested in model organisms will be of particular importance with the advent of new, large-scale phenotyping projects such as the International Mouse Phenotyping Consortium. In addition to improved semantic description, funding and organizational innovations are required to support this integration. In particular, a series of national or international hubs to hold genotype and phenotype data are needed which could feed data to a central database. In addition, better coordination of clinical and bioinformatics experts and, crucially, development of a transnational funding and international coordination infrastructure will be required.
全球范围内的各种数据库中不断积累着与疾病相关的突变和相关表型的数据。由于这些数据库中的表型信息通常难以进行有意义的搜索或在数据集之间进行关联,并且无法进行自动化的计算集成,因此在全基因组测序的背景下利用这些数据受到了限制。实现这一集成的关键是开发基于本体的方法,以便根据组成表型来描述疾病。这将允许分析疾病表现的变化、模型生物中疾病与表型之间的关系,并将疾病与基因突变、途径和表型联系起来。随着国际小鼠表型联盟等新的大规模表型项目的出现,与模型生物中表现出的表型建立系统联系将尤为重要。除了改进语义描述外,还需要资金和组织方面的创新来支持这种集成。特别是需要建立一系列国家或国际中心,以保存基因型和表型数据,并将数据提供给中央数据库。此外,还需要更好地协调临床和生物信息学专家,并且至关重要的是,需要开发跨国资金和国际协调基础设施。