BMC Bioinformatics. 2012 Jan 25;13 Suppl 1(Suppl 1):S9. doi: 10.1186/1471-2105-13-S1-S9.
Personalised medicine provides patients with treatments that are specific to their genetic profiles. It requires efficient data sharing of disparate data types across a variety of scientific disciplines, such as molecular biology, pathology, radiology and clinical practice. Personalised medicine aims to offer the safest and most effective therapeutic strategy based on the gene variations of each subject. In particular, this is valid in oncology, where knowledge about genetic mutations has already led to new therapies. Current molecular biology techniques (microarrays, proteomics, epigenetic technology and improved DNA sequencing technology) enable better characterisation of cancer tumours. The vast amounts of data, however, coupled with the use of different terms - or semantic heterogeneity - in each discipline makes the retrieval and integration of information difficult.
Existing software infrastructures for data-sharing in the cancer domain, such as caGrid, support access to distributed information. caGrid follows a service-oriented model-driven architecture. Each data source in caGrid is associated with metadata at increasing levels of abstraction, including syntactic, structural, reference and domain metadata. The domain metadata consists of ontology-based annotations associated with the structural information of each data source. However, caGrid's current querying functionality is given at the structural metadata level, without capitalising on the ontology-based annotations. This paper presents the design of and theoretical foundations for distributed ontology-based queries over cancer research data. Concept-based queries are reformulated to the target query language, where join conditions between multiple data sources are found by exploiting the semantic annotations. The system has been implemented, as a proof of concept, over the caGrid infrastructure. The approach is applicable to other model-driven architectures. A graphical user interface has been developed, supporting ontology-based queries over caGrid data sources. An extensive evaluation of the query reformulation technique is included.
To support personalised medicine in oncology, it is crucial to retrieve and integrate molecular, pathology, radiology and clinical data in an efficient manner. The semantic heterogeneity of the data makes this a challenging task. Ontologies provide a formal framework to support querying and integration. This paper provides an ontology-based solution for querying distributed databases over service-oriented, model-driven infrastructures.
个性化医学为患者提供针对其基因谱的特定治疗方法。它需要在各种科学学科(例如分子生物学、病理学、放射学和临床实践)之间高效地共享不同类型的数据。个性化医学旨在根据每个主体的基因变异提供最安全、最有效的治疗策略。特别是在肿瘤学中,有关遗传突变的知识已经导致了新的治疗方法。当前的分子生物学技术(微阵列、蛋白质组学、表观遗传学技术和改进的 DNA 测序技术)能够更好地描述肿瘤。但是,大量的数据加上每个学科中使用的不同术语(或语义异构性)使得信息的检索和集成变得困难。
癌症领域现有的数据共享软件基础设施(如 caGrid)支持对分布式信息的访问。caGrid 采用面向服务的模型驱动架构。caGrid 中的每个数据源都与元数据相关联,元数据的抽象级别逐渐提高,包括语法、结构、参考和领域元数据。领域元数据由与每个数据源的结构信息相关联的基于本体的注释组成。然而,caGrid 当前的查询功能是在结构元数据级别提供的,没有利用基于本体的注释。本文提出了基于分布式本体的癌症研究数据查询的设计和理论基础。基于概念的查询被重新表述为目标查询语言,通过利用语义注释找到多个数据源之间的连接条件。该系统已在 caGrid 基础设施上实现,作为概念验证。该方法适用于其他模型驱动架构。已经开发了一个图形用户界面,支持在 caGrid 数据源上进行基于本体的查询。包括对查询重写技术的广泛评估。
为了支持肿瘤学中的个性化医学,必须以有效的方式检索和整合分子、病理学、放射学和临床数据。数据的语义异构性使得这成为一项具有挑战性的任务。本体提供了支持查询和集成的正式框架。本文提供了一种基于本体的解决方案,用于在面向服务的模型驱动基础架构上查询分布式数据库。