Université de Lorraine, CNRS, Inria, LORIA, Nancy, 54000, France.
LIRMM, Université de Montpellier, CNRS, Montpellier, 34095, France.
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):139. doi: 10.1186/s12859-019-2693-9.
Pharmacogenomics (PGx) studies how genomic variations impact variations in drug response phenotypes. Knowledge in pharmacogenomics is typically composed of units that have the form of ternary relationships gene variant - drug - adverse event. Such a relationship states that an adverse event may occur for patients having the specified gene variant and being exposed to the specified drug. State-of-the-art knowledge in PGx is mainly available in reference databases such as PharmGKB and reported in scientific biomedical literature. But, PGx knowledge can also be discovered from clinical data, such as Electronic Health Records (EHRs), and in this case, may either correspond to new knowledge or confirm state-of-the-art knowledge that lacks "clinical counterpart" or validation. For this reason, there is a need for automatic comparison of knowledge units from distinct sources.
In this article, we propose an approach, based on Semantic Web technologies, to represent and compare PGx knowledge units. To this end, we developed PGxO, a simple ontology that represents PGx knowledge units and their components. Combined with PROV-O, an ontology developed by the W3C to represent provenance information, PGxO enables encoding and associating provenance information to PGx relationships. Additionally, we introduce a set of rules to reconcile PGx knowledge, i.e. to identify when two relationships, potentially expressed using different vocabularies and levels of granularity, refer to the same, or to different knowledge units. We evaluated our ontology and rules by populating PGxO with knowledge units extracted from PharmGKB (2701), the literature (65,720) and from discoveries reported in EHR analysis studies (only 10, manually extracted); and by testing their similarity. We called PGxLOD (PGx Linked Open Data) the resulting knowledge base that represents and reconciles knowledge units of those various origins.
The proposed ontology and reconciliation rules constitute a first step toward a more complete framework for knowledge comparison in PGx. In this direction, the experimental instantiation of PGxO, named PGxLOD, illustrates the ability and difficulties of reconciling various existing knowledge sources.
药物基因组学(PGx)研究基因变异如何影响药物反应表型的变化。PGx 知识通常由具有三元关系形式的单元组成:基因变异-药物-不良反应。这种关系表明,具有特定基因变异并接触特定药物的患者可能会发生不良反应。PGx 的最新知识主要存在于 PharmGKB 等参考数据库中,并在科学生物医学文献中报道。但是,PGx 知识也可以从电子健康记录(EHR)等临床数据中发现,在这种情况下,它可能对应于新知识,或者确认缺乏“临床对应物”或验证的最新知识。因此,需要自动比较来自不同来源的知识单元。
在本文中,我们提出了一种基于语义 Web 技术的方法来表示和比较 PGx 知识单元。为此,我们开发了 PGxO,这是一个简单的本体,用于表示 PGx 知识单元及其组件。与 W3C 开发的表示出处信息的 PROV-O 结合使用,PGxO 可以对 PGx 关系进行编码并关联出处信息。此外,我们引入了一组规则来协调 PGx 知识,即确定两个关系(可能使用不同的词汇和粒度来表达)是否指的是相同的知识单元,或者是否指的是不同的知识单元。我们通过将从 PharmGKB(2701)、文献(65720)和 EHR 分析研究报告中发现的知识单元(仅 10 个,手动提取)提取到 PGxO 中,并通过测试它们的相似性来评估我们的本体和规则。我们将由此产生的代表和协调来自各种来源的知识单元的知识库称为 PGxLOD(PGx 链接开放数据)。
所提出的本体和协调规则构成了 PGx 中知识比较更完整框架的第一步。在这一方向上,PGxO 的实验实例 PGxLOD 说明了协调各种现有知识库的能力和困难。