Beck Tim, Free Robert C, Thorisson Gudmundur A, Brookes Anthony J
Department of Genetics, University of Leicester, University Road, Leicester, UK.
J Biomed Semantics. 2012 Dec 17;3(1):9. doi: 10.1186/2041-1480-3-9.
The amount of data generated from genome-wide association studies (GWAS) has grown rapidly, but considerations for GWAS phenotype data reuse and interchange have not kept pace. This impacts on the work of GWAS Central - a free and open access resource for the advanced querying and comparison of summary-level genetic association data. The benefits of employing ontologies for standardising and structuring data are widely accepted. The complex spectrum of observed human phenotypes (and traits), and the requirement for cross-species phenotype comparisons, calls for reflection on the most appropriate solution for the organisation of human phenotype data. The Semantic Web provides standards for the possibility of further integration of GWAS data and the ability to contribute to the web of Linked Data.
A pragmatic consideration when applying phenotype ontologies to GWAS data is the ability to retrieve all data, at the most granular level possible, from querying a single ontology graph. We found the Medical Subject Headings (MeSH) terminology suitable for describing all traits (diseases and medical signs and symptoms) at various levels of granularity and the Human Phenotype Ontology (HPO) most suitable for describing phenotypic abnormalities (medical signs and symptoms) at the most granular level. Diseases within MeSH are mapped to HPO to infer the phenotypic abnormalities associated with diseases. Building on the rich semantic phenotype annotation layer, we are able to make cross-species phenotype comparisons and publish a core subset of GWAS data as RDF nanopublications.
We present a methodology for applying phenotype annotations to a comprehensive genome-wide association dataset and for ensuring compatibility with the Semantic Web. The annotations are used to assist with cross-species genotype and phenotype comparisons. However, further processing and deconstructions of terms may be required to facilitate automatic phenotype comparisons. The provision of GWAS nanopublications enables a new dimension for exploring GWAS data, by way of intrinsic links to related data resources within the Linked Data web. The value of such annotation and integration will grow as more biomedical resources adopt the standards of the Semantic Web.
全基因组关联研究(GWAS)产生的数据量增长迅速,但GWAS表型数据的再利用和交换方面的考量却未能跟上步伐。这影响了GWAS Central的工作,GWAS Central是一个免费的开放获取资源,用于对汇总级遗传关联数据进行高级查询和比较。采用本体对数据进行标准化和结构化的好处已得到广泛认可。观察到的人类表型(和性状)的复杂谱系以及跨物种表型比较的需求,促使人们思考组织人类表型数据的最合适解决方案。语义网为进一步整合GWAS数据以及为关联数据网络做出贡献提供了标准。
将表型本体应用于GWAS数据时,一个实际的考量是能够从查询单个本体图中,以尽可能最细粒度的级别检索所有数据。我们发现医学主题词表(MeSH)术语适用于在不同粒度级别描述所有性状(疾病以及医学体征和症状),而人类表型本体(HPO)最适合在最细粒度级别描述表型异常(医学体征和症状)。MeSH中的疾病被映射到HPO,以推断与疾病相关的表型异常。基于丰富的语义表型注释层,我们能够进行跨物种表型比较,并将GWAS数据的核心子集发布为RDF纳米出版物。
我们提出了一种方法,用于将表型注释应用于全面的全基因组关联数据集,并确保与语义网兼容。这些注释用于辅助跨物种基因型和表型比较。然而,可能需要对术语进行进一步处理和解构,以促进自动表型比较。通过与关联数据网络中的相关数据资源建立内在链接,提供GWAS纳米出版物为探索GWAS数据开辟了一个新维度。随着越来越多的生物医学资源采用语义网的标准,这种注释和整合的价值将会增加。